Spaces:

Faffio
/

Sentiment-Analysis

Sleeping

File size: 6,486 Bytes

427cbc1
aef5ea1
427cbc1
 
 
 
 
 
 
 
aef5ea1
d868e89
c20266f
 
aef5ea1
c20266f
 
d868e89
33e3d05
 
 
 
 
 
 
 
 
 
 
 
c20266f
d868e89
cfec995
d868e89
a33f647
d868e89
c20266f
aef5ea1
 
 
a33f647
aef5ea1
 
d868e89
c20266f
d868e89
c20266f
d868e89
c20266f
aef5ea1
 
 
 
 
 
d868e89
c20266f
d868e89
c20266f
d868e89
c20266f
d868e89
aef5ea1
 
 
d868e89
aef5ea1
 
 
d868e89
c20266f
aef5ea1
a33f647
aef5ea1
 
 
d868e89
aef5ea1
 
d868e89
c20266f
d868e89
c20266f
d868e89
c20266f
aef5ea1
 
 
 
 
a33f647
aef5ea1
a33f647
aef5ea1
 
 
 
a33f647
aef5ea1
 
a33f647
aef5ea1
 
 
a33f647
 
aef5ea1
 
d868e89
a33f647
 
aef5ea1
d868e89
a33f647
 
aef5ea1
 
d868e89
a33f647
aef5ea1
d868e89
a33f647
d868e89
a33f647
 
d868e89
a33f647
d868e89
a33f647
 
d868e89
a33f647
d868e89
a33f647
 
d868e89
aef5ea1
 
d868e89
aef5ea1
d868e89
aef5ea1
d868e89
 
aef5ea1
d868e89

---
title: Reputation Monitor
emoji: 📊
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
app_port: 7860
---

# 📊 End-to-End MLOps Pipeline for Real-Time Reputation Monitoring

![Build Status](https://img.shields.io/badge/build-passing-brightgreen)
![Python](https://img.shields.io/badge/python-3.9%2B-blue)
![Model](https://img.shields.io/badge/model-RoBERTa-yellow)
![Deployment](https://img.shields.io/badge/deployed%20on-HuggingFace-orange)
![License](https://img.shields.io/badge/license-MIT-green)

### 👤 Author

**Fabio Celaschi**

<a href="https://www.linkedin.com/in/fabio-celaschi-4371bb92">
  <img src="https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white" alt="LinkedIn" />
</a>

<a href="https://www.instagram.com/fabiocelaschi/">
  <img src="https://img.shields.io/badge/Instagram-E4405F?style=for-the-badge&logo=instagram&logoColor=white" alt="Instagram" />
</a>

## 🚀 Project Overview

This project is a comprehensive **MLOps solution** designed to monitor online company reputation through automated sentiment analysis of real-time news. It was developed to demonstrate **scalable, production-ready machine learning engineering** capabilities.

Unlike standard static notebooks, this repository demonstrates a **full-cycle ML workflow**. The system scrapes live data from **Google News**, analyzes sentiment using a **RoBERTa Transformer** model, and visualizes insights via an interactive dashboard, all orchestrated within a Dockerized environment.

### Key Features
* **Real-Time Data Ingestion:** Automated scraping of Google News for target brand keywords.
* **State-of-the-Art NLP:** Utilizes `twitter-roberta-base-sentiment` for high-accuracy classification.
* **Full-Stack Architecture:** Integrates a **FastAPI** backend for inference and a **Streamlit** frontend for visualization in a single container.
* **Automated Continuous Training (CT):** Implements a pipeline logic that checks for new data and simulates model fine-tuning during CI/CD execution.
* **CI/CD Automation:** Robust GitHub Actions pipeline for automated testing, building, and deployment to Hugging Face Spaces.
* **Embedded Monitoring:** Basic logging system to track model predictions and sentiment distribution over time.

---

## 🛠️ Tech Stack & Tools

* **Core:** Python 3.9+
* **Machine Learning:** Hugging Face Transformers, PyTorch, Scikit-learn.
* **Backend:** FastAPI, Uvicorn (REST API).
* **Frontend:** Streamlit (Interactive Dashboard).
* **Data Ingestion:** `GoogleNews` library (Real-time scraping).
* **DevOps:** Docker, GitHub Actions (CI/CD).
* **Deployment:** Hugging Face Spaces (Docker SDK).

---

## ⚙️ Architecture & MLOps Workflow

The project follows a rigorous MLOps pipeline to ensure reliability and speed of delivery:

1.  **Data & Modeling:**
    * **Input:** Real-time news titles and descriptions fetched dynamically.
    * **Model:** Pre-trained **RoBERTa** model optimized for social media and short-text sentiment.

2.  **Containerization (Docker):**
    * The application is containerized using a custom `Dockerfile`.
    * Implements a custom `entrypoint.sh` script to run both the **FastAPI backend** (port 8000) and **Streamlit frontend** (port 7860) simultaneously.

3.  **CI/CD Pipeline (GitHub Actions):**
    * **Trigger:** Pushes to the `main` branch.
    * **Continuous Training:** Checks the `data/` directory for new labeled datasets. If found, initiates a training simulation to demonstrate the retraining lifecycle.
    * **Test:** Executes `pytest` suite to verify API endpoints (`/health`, `/analyze`) and model loading.
    * **Build:** Verifies Docker image creation.
    * **Deploy:** Automatically pushes the validated code to Hugging Face Spaces.

4.  **Monitoring:**
    * The system logs every prediction to a local CSV file, which is visualized in the "Monitoring" tab of the dashboard.

---

## 📂 Repository Structure

```bash
├── .github/workflows/   # CI/CD configurations (GitHub Actions)
├── app/                 # Backend Application Code
│   ├── api/             # FastAPI endpoints (main.py)
│   ├── model/           # Model loader logic (RoBERTa)
│   └── services/        # Google News scraping logic
├── data/                # Dataset storage for retraining
├── streamlit_app/       # Frontend Application Code (app.py)
├── src/                 # Training scripts (Simulation)
├── tests/               # Unit and integration tests (Pytest)
├── Dockerfile           # Container configuration
├── entrypoint.sh        # Startup script for dual-process execution
├── requirements.txt     # Project dependencies
├── Appunti_Progetto.doc # Note and explanation of the project
└── README.md            # Project documentation


💻 Installation & Usage
To run this project locally using Docker (Recommended):

### 1. Clone the repository
```bash
git clone [https://github.com/YOUR_USERNAME/SentimentAnalysis.git](https://github.com/YOUR_USERNAME/SentimentAnalysis.git)
cd SentimentAnalysis

### 2. Build the Docker Image
```bash
docker build -t reputation-monitor .

### 3. Run the Container
```bash
docker run -p 7860:7860 reputation-monitor
Access the application at http://localhost:7860

Manual Installation (No Docker):
If you prefer running it directly with Python:

    1. Install dependencies:

    ```bash
    pip install -r requirements.txt

    2. Start the Backend (FastAPI):

    ```bash
    uvicorn app.api.main:app --host 0.0.0.0 --port 8000 --reload

    3. Start the Frontend (Streamlit) in a new terminal:

    ```bash
    streamlit run streamlit_app/app.py

⚠️ Limitations & Future Roadmap
Data Persistence: Currently, monitoring logs are stored in an ephemeral CSV file. In a production environment, this would be replaced by a persistent database (e.g., PostgreSQL) to ensure data retention across container restarts.

Scalability: The current Google News scraper is synchronous. Future versions will implement asynchronous scraping (aiohttp) or a message queue (RabbitMQ/Celery) for high-volume processing.

Model Retraining: A placeholder pipeline (src/train.py) is included. Full implementation would require GPU resources and a labeled dataset for fine-tuning.

🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.

📝 License
Distributed under the MIT License. See LICENSE for more information.