Spaces:
Sleeping
Sleeping
File size: 6,486 Bytes
427cbc1 aef5ea1 427cbc1 aef5ea1 d868e89 c20266f aef5ea1 c20266f d868e89 33e3d05 c20266f d868e89 cfec995 d868e89 a33f647 d868e89 c20266f aef5ea1 a33f647 aef5ea1 d868e89 c20266f d868e89 c20266f d868e89 c20266f aef5ea1 d868e89 c20266f d868e89 c20266f d868e89 c20266f d868e89 aef5ea1 d868e89 aef5ea1 d868e89 c20266f aef5ea1 a33f647 aef5ea1 d868e89 aef5ea1 d868e89 c20266f d868e89 c20266f d868e89 c20266f aef5ea1 a33f647 aef5ea1 a33f647 aef5ea1 a33f647 aef5ea1 a33f647 aef5ea1 a33f647 aef5ea1 d868e89 a33f647 aef5ea1 d868e89 a33f647 aef5ea1 d868e89 a33f647 aef5ea1 d868e89 a33f647 d868e89 a33f647 d868e89 a33f647 d868e89 a33f647 d868e89 a33f647 d868e89 a33f647 d868e89 aef5ea1 d868e89 aef5ea1 d868e89 aef5ea1 d868e89 aef5ea1 d868e89 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
---
title: Reputation Monitor
emoji: π
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
app_port: 7860
---
# π End-to-End MLOps Pipeline for Real-Time Reputation Monitoring





### π€ Author
**Fabio Celaschi**
<a href="https://www.linkedin.com/in/fabio-celaschi-4371bb92">
<img src="https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white" alt="LinkedIn" />
</a>
<a href="https://www.instagram.com/fabiocelaschi/">
<img src="https://img.shields.io/badge/Instagram-E4405F?style=for-the-badge&logo=instagram&logoColor=white" alt="Instagram" />
</a>
## π Project Overview
This project is a comprehensive **MLOps solution** designed to monitor online company reputation through automated sentiment analysis of real-time news. It was developed to demonstrate **scalable, production-ready machine learning engineering** capabilities.
Unlike standard static notebooks, this repository demonstrates a **full-cycle ML workflow**. The system scrapes live data from **Google News**, analyzes sentiment using a **RoBERTa Transformer** model, and visualizes insights via an interactive dashboard, all orchestrated within a Dockerized environment.
### Key Features
* **Real-Time Data Ingestion:** Automated scraping of Google News for target brand keywords.
* **State-of-the-Art NLP:** Utilizes `twitter-roberta-base-sentiment` for high-accuracy classification.
* **Full-Stack Architecture:** Integrates a **FastAPI** backend for inference and a **Streamlit** frontend for visualization in a single container.
* **Automated Continuous Training (CT):** Implements a pipeline logic that checks for new data and simulates model fine-tuning during CI/CD execution.
* **CI/CD Automation:** Robust GitHub Actions pipeline for automated testing, building, and deployment to Hugging Face Spaces.
* **Embedded Monitoring:** Basic logging system to track model predictions and sentiment distribution over time.
---
## π οΈ Tech Stack & Tools
* **Core:** Python 3.9+
* **Machine Learning:** Hugging Face Transformers, PyTorch, Scikit-learn.
* **Backend:** FastAPI, Uvicorn (REST API).
* **Frontend:** Streamlit (Interactive Dashboard).
* **Data Ingestion:** `GoogleNews` library (Real-time scraping).
* **DevOps:** Docker, GitHub Actions (CI/CD).
* **Deployment:** Hugging Face Spaces (Docker SDK).
---
## βοΈ Architecture & MLOps Workflow
The project follows a rigorous MLOps pipeline to ensure reliability and speed of delivery:
1. **Data & Modeling:**
* **Input:** Real-time news titles and descriptions fetched dynamically.
* **Model:** Pre-trained **RoBERTa** model optimized for social media and short-text sentiment.
2. **Containerization (Docker):**
* The application is containerized using a custom `Dockerfile`.
* Implements a custom `entrypoint.sh` script to run both the **FastAPI backend** (port 8000) and **Streamlit frontend** (port 7860) simultaneously.
3. **CI/CD Pipeline (GitHub Actions):**
* **Trigger:** Pushes to the `main` branch.
* **Continuous Training:** Checks the `data/` directory for new labeled datasets. If found, initiates a training simulation to demonstrate the retraining lifecycle.
* **Test:** Executes `pytest` suite to verify API endpoints (`/health`, `/analyze`) and model loading.
* **Build:** Verifies Docker image creation.
* **Deploy:** Automatically pushes the validated code to Hugging Face Spaces.
4. **Monitoring:**
* The system logs every prediction to a local CSV file, which is visualized in the "Monitoring" tab of the dashboard.
---
## π Repository Structure
```bash
βββ .github/workflows/ # CI/CD configurations (GitHub Actions)
βββ app/ # Backend Application Code
β βββ api/ # FastAPI endpoints (main.py)
β βββ model/ # Model loader logic (RoBERTa)
β βββ services/ # Google News scraping logic
βββ data/ # Dataset storage for retraining
βββ streamlit_app/ # Frontend Application Code (app.py)
βββ src/ # Training scripts (Simulation)
βββ tests/ # Unit and integration tests (Pytest)
βββ Dockerfile # Container configuration
βββ entrypoint.sh # Startup script for dual-process execution
βββ requirements.txt # Project dependencies
βββ Appunti_Progetto.doc # Note and explanation of the project
βββ README.md # Project documentation
π» Installation & Usage
To run this project locally using Docker (Recommended):
### 1. Clone the repository
```bash
git clone [https://github.com/YOUR_USERNAME/SentimentAnalysis.git](https://github.com/YOUR_USERNAME/SentimentAnalysis.git)
cd SentimentAnalysis
### 2. Build the Docker Image
```bash
docker build -t reputation-monitor .
### 3. Run the Container
```bash
docker run -p 7860:7860 reputation-monitor
Access the application at http://localhost:7860
Manual Installation (No Docker):
If you prefer running it directly with Python:
1. Install dependencies:
```bash
pip install -r requirements.txt
2. Start the Backend (FastAPI):
```bash
uvicorn app.api.main:app --host 0.0.0.0 --port 8000 --reload
3. Start the Frontend (Streamlit) in a new terminal:
```bash
streamlit run streamlit_app/app.py
β οΈ Limitations & Future Roadmap
Data Persistence: Currently, monitoring logs are stored in an ephemeral CSV file. In a production environment, this would be replaced by a persistent database (e.g., PostgreSQL) to ensure data retention across container restarts.
Scalability: The current Google News scraper is synchronous. Future versions will implement asynchronous scraping (aiohttp) or a message queue (RabbitMQ/Celery) for high-volume processing.
Model Retraining: A placeholder pipeline (src/train.py) is included. Full implementation would require GPU resources and a labeled dataset for fine-tuning.
π€ Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
π License
Distributed under the MIT License. See LICENSE for more information.
|