Spaces:
Sleeping
Sleeping
Readme update
Browse files
README.md
CHANGED
|
@@ -20,12 +20,13 @@ app_port: 7860
|
|
| 20 |
|
| 21 |
**MachineInnovators Inc.** focuses on scalable, production-ready machine learning applications. This project is a comprehensive **MLOps solution** designed to monitor online company reputation through automated sentiment analysis of real-time news.
|
| 22 |
|
| 23 |
-
Unlike standard static notebooks, this repository demonstrates a **full-cycle ML workflow**. The system scrapes live data from **Google News**, analyzes sentiment using a **RoBERTa Transformer** model, and visualizes insights via an interactive dashboard, all
|
| 24 |
|
| 25 |
### Key Features
|
| 26 |
* **Real-Time Data Ingestion:** Automated scraping of Google News for target brand keywords.
|
| 27 |
* **State-of-the-Art NLP:** Utilizes `twitter-roberta-base-sentiment` for high-accuracy classification.
|
| 28 |
* **Full-Stack Architecture:** Integrates a **FastAPI** backend for inference and a **Streamlit** frontend for visualization in a single container.
|
|
|
|
| 29 |
* **CI/CD Automation:** Robust GitHub Actions pipeline for automated testing, building, and deployment to Hugging Face Spaces.
|
| 30 |
* **Embedded Monitoring:** Basic logging system to track model predictions and sentiment distribution over time.
|
| 31 |
|
|
@@ -57,6 +58,7 @@ The project follows a rigorous MLOps pipeline to ensure reliability and speed of
|
|
| 57 |
|
| 58 |
3. **CI/CD Pipeline (GitHub Actions):**
|
| 59 |
* **Trigger:** Pushes to the `main` branch.
|
|
|
|
| 60 |
* **Test:** Executes `pytest` suite to verify API endpoints (`/health`, `/analyze`) and model loading.
|
| 61 |
* **Build:** Verifies Docker image creation.
|
| 62 |
* **Deploy:** Automatically pushes the validated code to Hugging Face Spaces.
|
|
@@ -74,50 +76,52 @@ The project follows a rigorous MLOps pipeline to ensure reliability and speed of
|
|
| 74 |
β βββ api/ # FastAPI endpoints (main.py)
|
| 75 |
β βββ model/ # Model loader logic (RoBERTa)
|
| 76 |
β βββ services/ # Google News scraping logic
|
|
|
|
| 77 |
βββ streamlit_app/ # Frontend Application Code (app.py)
|
| 78 |
-
βββ src/ # Training
|
| 79 |
βββ tests/ # Unit and integration tests (Pytest)
|
| 80 |
βββ Dockerfile # Container configuration
|
| 81 |
βββ entrypoint.sh # Startup script for dual-process execution
|
| 82 |
βββ requirements.txt # Project dependencies
|
|
|
|
| 83 |
βββ README.md # Project documentation
|
| 84 |
|
|
|
|
| 85 |
π» Installation & Usage
|
| 86 |
To run this project locally using Docker (Recommended):
|
| 87 |
|
| 88 |
-
1. Clone the repository
|
| 89 |
-
|
| 90 |
-
|
| 91 |
git clone [https://github.com/YOUR_USERNAME/SentimentAnalysis.git](https://github.com/YOUR_USERNAME/SentimentAnalysis.git)
|
| 92 |
cd SentimentAnalysis
|
| 93 |
-
2. Build the Docker Image
|
| 94 |
-
Bash
|
| 95 |
|
|
|
|
|
|
|
| 96 |
docker build -t reputation-monitor .
|
| 97 |
-
3. Run the Container
|
| 98 |
-
Bash
|
| 99 |
|
|
|
|
|
|
|
| 100 |
docker run -p 7860:7860 reputation-monitor
|
| 101 |
Access the application at http://localhost:7860
|
| 102 |
|
| 103 |
-
Manual Installation (No Docker)
|
| 104 |
If you prefer running it directly with Python:
|
| 105 |
|
| 106 |
-
Install dependencies:
|
| 107 |
|
| 108 |
-
|
|
|
|
| 109 |
|
| 110 |
-
|
| 111 |
-
Start the Backend (FastAPI):
|
| 112 |
|
| 113 |
-
|
|
|
|
| 114 |
|
| 115 |
-
|
| 116 |
-
Start the Frontend (Streamlit) in a new terminal:
|
| 117 |
|
| 118 |
-
|
|
|
|
| 119 |
|
| 120 |
-
streamlit run streamlit_app/app.py
|
| 121 |
β οΈ Limitations & Future Roadmap
|
| 122 |
Data Persistence: Currently, monitoring logs are stored in an ephemeral CSV file. In a production environment, this would be replaced by a persistent database (e.g., PostgreSQL) to ensure data retention across container restarts.
|
| 123 |
|
|
|
|
| 20 |
|
| 21 |
**MachineInnovators Inc.** focuses on scalable, production-ready machine learning applications. This project is a comprehensive **MLOps solution** designed to monitor online company reputation through automated sentiment analysis of real-time news.
|
| 22 |
|
| 23 |
+
Unlike standard static notebooks, this repository demonstrates a **full-cycle ML workflow**. The system scrapes live data from **Google News**, analyzes sentiment using a **RoBERTa Transformer** model, and visualizes insights via an interactive dashboard, all orchestrated within a Dockerized environment.
|
| 24 |
|
| 25 |
### Key Features
|
| 26 |
* **Real-Time Data Ingestion:** Automated scraping of Google News for target brand keywords.
|
| 27 |
* **State-of-the-Art NLP:** Utilizes `twitter-roberta-base-sentiment` for high-accuracy classification.
|
| 28 |
* **Full-Stack Architecture:** Integrates a **FastAPI** backend for inference and a **Streamlit** frontend for visualization in a single container.
|
| 29 |
+
* **Automated Continuous Training (CT):** Implements a pipeline logic that checks for new data and simulates model fine-tuning during CI/CD execution.
|
| 30 |
* **CI/CD Automation:** Robust GitHub Actions pipeline for automated testing, building, and deployment to Hugging Face Spaces.
|
| 31 |
* **Embedded Monitoring:** Basic logging system to track model predictions and sentiment distribution over time.
|
| 32 |
|
|
|
|
| 58 |
|
| 59 |
3. **CI/CD Pipeline (GitHub Actions):**
|
| 60 |
* **Trigger:** Pushes to the `main` branch.
|
| 61 |
+
* **Continuous Training:** Checks the `data/` directory for new labeled datasets. If found, initiates a training simulation to demonstrate the retraining lifecycle.
|
| 62 |
* **Test:** Executes `pytest` suite to verify API endpoints (`/health`, `/analyze`) and model loading.
|
| 63 |
* **Build:** Verifies Docker image creation.
|
| 64 |
* **Deploy:** Automatically pushes the validated code to Hugging Face Spaces.
|
|
|
|
| 76 |
β βββ api/ # FastAPI endpoints (main.py)
|
| 77 |
β βββ model/ # Model loader logic (RoBERTa)
|
| 78 |
β βββ services/ # Google News scraping logic
|
| 79 |
+
βββ data/ # Dataset storage for retraining
|
| 80 |
βββ streamlit_app/ # Frontend Application Code (app.py)
|
| 81 |
+
βββ src/ # Training scripts (Simulation)
|
| 82 |
βββ tests/ # Unit and integration tests (Pytest)
|
| 83 |
βββ Dockerfile # Container configuration
|
| 84 |
βββ entrypoint.sh # Startup script for dual-process execution
|
| 85 |
βββ requirements.txt # Project dependencies
|
| 86 |
+
βββ Appunti_Progetto.doc # Note and explanation of the project
|
| 87 |
βββ README.md # Project documentation
|
| 88 |
|
| 89 |
+
|
| 90 |
π» Installation & Usage
|
| 91 |
To run this project locally using Docker (Recommended):
|
| 92 |
|
| 93 |
+
### 1. Clone the repository
|
| 94 |
+
```bash
|
|
|
|
| 95 |
git clone [https://github.com/YOUR_USERNAME/SentimentAnalysis.git](https://github.com/YOUR_USERNAME/SentimentAnalysis.git)
|
| 96 |
cd SentimentAnalysis
|
|
|
|
|
|
|
| 97 |
|
| 98 |
+
### 2. Build the Docker Image
|
| 99 |
+
```bash
|
| 100 |
docker build -t reputation-monitor .
|
|
|
|
|
|
|
| 101 |
|
| 102 |
+
### 3. Run the Container
|
| 103 |
+
```bash
|
| 104 |
docker run -p 7860:7860 reputation-monitor
|
| 105 |
Access the application at http://localhost:7860
|
| 106 |
|
| 107 |
+
Manual Installation (No Docker):
|
| 108 |
If you prefer running it directly with Python:
|
| 109 |
|
| 110 |
+
1. Install dependencies:
|
| 111 |
|
| 112 |
+
```bash
|
| 113 |
+
pip install -r requirements.txt
|
| 114 |
|
| 115 |
+
2. Start the Backend (FastAPI):
|
|
|
|
| 116 |
|
| 117 |
+
```bash
|
| 118 |
+
uvicorn app.api.main:app --host 0.0.0.0 --port 8000 --reload
|
| 119 |
|
| 120 |
+
3. Start the Frontend (Streamlit) in a new terminal:
|
|
|
|
| 121 |
|
| 122 |
+
```bash
|
| 123 |
+
streamlit run streamlit_app/app.py
|
| 124 |
|
|
|
|
| 125 |
β οΈ Limitations & Future Roadmap
|
| 126 |
Data Persistence: Currently, monitoring logs are stored in an ephemeral CSV file. In a production environment, this would be replaced by a persistent database (e.g., PostgreSQL) to ensure data retention across container restarts.
|
| 127 |
|