Spaces:
Sleeping
Sleeping
| title: Reputation Monitor | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: docker | |
| pinned: false | |
| app_port: 7860 | |
| # π End-to-End MLOps Pipeline for Real-Time Reputation Monitoring | |
|  | |
|  | |
|  | |
|  | |
|  | |
| ### π€ Author | |
| **Fabio Celaschi** | |
| <a href="https://www.linkedin.com/in/fabio-celaschi-4371bb92"> | |
| <img src="https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white" alt="LinkedIn" /> | |
| </a> | |
| <a href="https://www.instagram.com/fabiocelaschi/"> | |
| <img src="https://img.shields.io/badge/Instagram-E4405F?style=for-the-badge&logo=instagram&logoColor=white" alt="Instagram" /> | |
| </a> | |
| ## π Project Overview | |
| This project is a comprehensive **MLOps solution** designed to monitor online company reputation through automated sentiment analysis of real-time news. It was developed to demonstrate **scalable, production-ready machine learning engineering** capabilities. | |
| Unlike standard static notebooks, this repository demonstrates a **full-cycle ML workflow**. The system scrapes live data from **Google News**, analyzes sentiment using a **RoBERTa Transformer** model, and visualizes insights via an interactive dashboard, all orchestrated within a Dockerized environment. | |
| ### Key Features | |
| * **Real-Time Data Ingestion:** Automated scraping of Google News for target brand keywords. | |
| * **State-of-the-Art NLP:** Utilizes `twitter-roberta-base-sentiment` for high-accuracy classification. | |
| * **Full-Stack Architecture:** Integrates a **FastAPI** backend for inference and a **Streamlit** frontend for visualization in a single container. | |
| * **Automated Continuous Training (CT):** Implements a pipeline logic that checks for new data and simulates model fine-tuning during CI/CD execution. | |
| * **CI/CD Automation:** Robust GitHub Actions pipeline for automated testing, building, and deployment to Hugging Face Spaces. | |
| * **Embedded Monitoring:** Basic logging system to track model predictions and sentiment distribution over time. | |
| --- | |
| ## π οΈ Tech Stack & Tools | |
| * **Core:** Python 3.9+ | |
| * **Machine Learning:** Hugging Face Transformers, PyTorch, Scikit-learn. | |
| * **Backend:** FastAPI, Uvicorn (REST API). | |
| * **Frontend:** Streamlit (Interactive Dashboard). | |
| * **Data Ingestion:** `GoogleNews` library (Real-time scraping). | |
| * **DevOps:** Docker, GitHub Actions (CI/CD). | |
| * **Deployment:** Hugging Face Spaces (Docker SDK). | |
| --- | |
| ## βοΈ Architecture & MLOps Workflow | |
| The project follows a rigorous MLOps pipeline to ensure reliability and speed of delivery: | |
| 1. **Data & Modeling:** | |
| * **Input:** Real-time news titles and descriptions fetched dynamically. | |
| * **Model:** Pre-trained **RoBERTa** model optimized for social media and short-text sentiment. | |
| 2. **Containerization (Docker):** | |
| * The application is containerized using a custom `Dockerfile`. | |
| * Implements a custom `entrypoint.sh` script to run both the **FastAPI backend** (port 8000) and **Streamlit frontend** (port 7860) simultaneously. | |
| 3. **CI/CD Pipeline (GitHub Actions):** | |
| * **Trigger:** Pushes to the `main` branch. | |
| * **Continuous Training:** Checks the `data/` directory for new labeled datasets. If found, initiates a training simulation to demonstrate the retraining lifecycle. | |
| * **Test:** Executes `pytest` suite to verify API endpoints (`/health`, `/analyze`) and model loading. | |
| * **Build:** Verifies Docker image creation. | |
| * **Deploy:** Automatically pushes the validated code to Hugging Face Spaces. | |
| 4. **Monitoring:** | |
| * The system logs every prediction to a local CSV file, which is visualized in the "Monitoring" tab of the dashboard. | |
| --- | |
| ## π Repository Structure | |
| ```bash | |
| βββ .github/workflows/ # CI/CD configurations (GitHub Actions) | |
| βββ app/ # Backend Application Code | |
| β βββ api/ # FastAPI endpoints (main.py) | |
| β βββ model/ # Model loader logic (RoBERTa) | |
| β βββ services/ # Google News scraping logic | |
| βββ data/ # Dataset storage for retraining | |
| βββ streamlit_app/ # Frontend Application Code (app.py) | |
| βββ src/ # Training scripts (Simulation) | |
| βββ tests/ # Unit and integration tests (Pytest) | |
| βββ Dockerfile # Container configuration | |
| βββ entrypoint.sh # Startup script for dual-process execution | |
| βββ requirements.txt # Project dependencies | |
| βββ Appunti_Progetto.doc # Note and explanation of the project | |
| βββ README.md # Project documentation | |
| π» Installation & Usage | |
| To run this project locally using Docker (Recommended): | |
| ### 1. Clone the repository | |
| ```bash | |
| git clone [https://github.com/YOUR_USERNAME/SentimentAnalysis.git](https://github.com/YOUR_USERNAME/SentimentAnalysis.git) | |
| cd SentimentAnalysis | |
| ### 2. Build the Docker Image | |
| ```bash | |
| docker build -t reputation-monitor . | |
| ### 3. Run the Container | |
| ```bash | |
| docker run -p 7860:7860 reputation-monitor | |
| Access the application at http://localhost:7860 | |
| Manual Installation (No Docker): | |
| If you prefer running it directly with Python: | |
| 1. Install dependencies: | |
| ```bash | |
| pip install -r requirements.txt | |
| 2. Start the Backend (FastAPI): | |
| ```bash | |
| uvicorn app.api.main:app --host 0.0.0.0 --port 8000 --reload | |
| 3. Start the Frontend (Streamlit) in a new terminal: | |
| ```bash | |
| streamlit run streamlit_app/app.py | |
| β οΈ Limitations & Future Roadmap | |
| Data Persistence: Currently, monitoring logs are stored in an ephemeral CSV file. In a production environment, this would be replaced by a persistent database (e.g., PostgreSQL) to ensure data retention across container restarts. | |
| Scalability: The current Google News scraper is synchronous. Future versions will implement asynchronous scraping (aiohttp) or a message queue (RabbitMQ/Celery) for high-volume processing. | |
| Model Retraining: A placeholder pipeline (src/train.py) is included. Full implementation would require GPU resources and a labeled dataset for fine-tuning. | |
| π€ Contributing | |
| Contributions are welcome! Please feel free to submit a Pull Request. | |
| π License | |
| Distributed under the MIT License. See LICENSE for more information. | |