Spaces:

Faffio
/

Sentiment-Analysis

Sleeping

App Files Files Community

Sentiment-Analysis / README.md

Faffio

Update README.md

cfec995 unverified 5 days ago

preview code

raw

history blame contribute delete

6.49 kB

	---
	title: Reputation Monitor
	emoji: 📊
	colorFrom: blue
	colorTo: indigo
	sdk: docker
	pinned: false
	app_port: 7860
	---

	# 📊 End-to-End MLOps Pipeline for Real-Time Reputation Monitoring

	![Build Status](https://img.shields.io/badge/build-passing-brightgreen)
	![Python](https://img.shields.io/badge/python-3.9%2B-blue)
	![Model](https://img.shields.io/badge/model-RoBERTa-yellow)
	![Deployment](https://img.shields.io/badge/deployed%20on-HuggingFace-orange)
	![License](https://img.shields.io/badge/license-MIT-green)

	### 👤 Author

	Fabio Celaschi

	<a href="https://www.linkedin.com/in/fabio-celaschi-4371bb92">
	<img src="https://img.shields.io/badge/LinkedIn-0077B5?style=for-the-badge&logo=linkedin&logoColor=white" alt="LinkedIn" />
	</a>

	<a href="https://www.instagram.com/fabiocelaschi/">
	<img src="https://img.shields.io/badge/Instagram-E4405F?style=for-the-badge&logo=instagram&logoColor=white" alt="Instagram" />
	</a>

	## 🚀 Project Overview

	This project is a comprehensive MLOps solution designed to monitor online company reputation through automated sentiment analysis of real-time news. It was developed to demonstrate scalable, production-ready machine learning engineering capabilities.

	Unlike standard static notebooks, this repository demonstrates a full-cycle ML workflow. The system scrapes live data from Google News, analyzes sentiment using a RoBERTa Transformer model, and visualizes insights via an interactive dashboard, all orchestrated within a Dockerized environment.

	### Key Features
	* Real-Time Data Ingestion: Automated scraping of Google News for target brand keywords.
	* State-of-the-Art NLP: Utilizes `twitter-roberta-base-sentiment` for high-accuracy classification.
	* Full-Stack Architecture: Integrates a FastAPI backend for inference and a Streamlit frontend for visualization in a single container.
	* Automated Continuous Training (CT): Implements a pipeline logic that checks for new data and simulates model fine-tuning during CI/CD execution.
	* CI/CD Automation: Robust GitHub Actions pipeline for automated testing, building, and deployment to Hugging Face Spaces.
	* Embedded Monitoring: Basic logging system to track model predictions and sentiment distribution over time.

	---

	## 🛠️ Tech Stack & Tools

	* Core: Python 3.9+
	* Machine Learning: Hugging Face Transformers, PyTorch, Scikit-learn.
	* Backend: FastAPI, Uvicorn (REST API).
	* Frontend: Streamlit (Interactive Dashboard).
	* Data Ingestion: `GoogleNews` library (Real-time scraping).
	* DevOps: Docker, GitHub Actions (CI/CD).
	* Deployment: Hugging Face Spaces (Docker SDK).

	---

	## ⚙️ Architecture & MLOps Workflow

	The project follows a rigorous MLOps pipeline to ensure reliability and speed of delivery:

	1. Data & Modeling:
	* Input: Real-time news titles and descriptions fetched dynamically.
	* Model: Pre-trained RoBERTa model optimized for social media and short-text sentiment.

	2. Containerization (Docker):
	* The application is containerized using a custom `Dockerfile`.
	* Implements a custom `entrypoint.sh` script to run both the FastAPI backend (port 8000) and Streamlit frontend (port 7860) simultaneously.

	3. CI/CD Pipeline (GitHub Actions):
	* Trigger: Pushes to the `main` branch.
	* Continuous Training: Checks the `data/` directory for new labeled datasets. If found, initiates a training simulation to demonstrate the retraining lifecycle.
	* Test: Executes `pytest` suite to verify API endpoints (`/health`, `/analyze`) and model loading.
	* Build: Verifies Docker image creation.
	* Deploy: Automatically pushes the validated code to Hugging Face Spaces.

	4. Monitoring:
	* The system logs every prediction to a local CSV file, which is visualized in the "Monitoring" tab of the dashboard.

	---

	## 📂 Repository Structure

	```bash
	├── .github/workflows/ # CI/CD configurations (GitHub Actions)
	├── app/ # Backend Application Code
	│ ├── api/ # FastAPI endpoints (main.py)
	│ ├── model/ # Model loader logic (RoBERTa)
	│ └── services/ # Google News scraping logic
	├── data/ # Dataset storage for retraining
	├── streamlit_app/ # Frontend Application Code (app.py)
	├── src/ # Training scripts (Simulation)
	├── tests/ # Unit and integration tests (Pytest)
	├── Dockerfile # Container configuration
	├── entrypoint.sh # Startup script for dual-process execution
	├── requirements.txt # Project dependencies
	├── Appunti_Progetto.doc # Note and explanation of the project
	└── README.md # Project documentation


	💻 Installation & Usage
	To run this project locally using Docker (Recommended):

	### 1. Clone the repository
	```bash
	git clone [https://github.com/YOUR_USERNAME/SentimentAnalysis.git](https://github.com/YOUR_USERNAME/SentimentAnalysis.git)
	cd SentimentAnalysis

	### 2. Build the Docker Image
	```bash
	docker build -t reputation-monitor .

	### 3. Run the Container
	```bash
	docker run -p 7860:7860 reputation-monitor
	Access the application at http://localhost:7860

	Manual Installation (No Docker):
	If you prefer running it directly with Python:

	1. Install dependencies:

	```bash
	pip install -r requirements.txt

	2. Start the Backend (FastAPI):

	```bash
	uvicorn app.api.main:app --host 0.0.0.0 --port 8000 --reload

	3. Start the Frontend (Streamlit) in a new terminal:

	```bash
	streamlit run streamlit_app/app.py

	⚠️ Limitations & Future Roadmap
	Data Persistence: Currently, monitoring logs are stored in an ephemeral CSV file. In a production environment, this would be replaced by a persistent database (e.g., PostgreSQL) to ensure data retention across container restarts.

	Scalability: The current Google News scraper is synchronous. Future versions will implement asynchronous scraping (aiohttp) or a message queue (RabbitMQ/Celery) for high-volume processing.

	Model Retraining: A placeholder pipeline (src/train.py) is included. Full implementation would require GPU resources and a labeled dataset for fine-tuning.

	🤝 Contributing
	Contributions are welcome! Please feel free to submit a Pull Request.

	📝 License
	Distributed under the MIT License. See LICENSE for more information.