Faffio commited on
Commit
a33f647
Β·
1 Parent(s): 2539bae

Readme update

Browse files
Files changed (1) hide show
  1. README.md +23 -19
README.md CHANGED
@@ -20,12 +20,13 @@ app_port: 7860
20
 
21
  **MachineInnovators Inc.** focuses on scalable, production-ready machine learning applications. This project is a comprehensive **MLOps solution** designed to monitor online company reputation through automated sentiment analysis of real-time news.
22
 
23
- Unlike standard static notebooks, this repository demonstrates a **full-cycle ML workflow**. The system scrapes live data from **Google News**, analyzes sentiment using a **RoBERTa Transformer** model, and visualizes insights via an interactive dashboard, all orchestrate within a Dockerized environment.
24
 
25
  ### Key Features
26
  * **Real-Time Data Ingestion:** Automated scraping of Google News for target brand keywords.
27
  * **State-of-the-Art NLP:** Utilizes `twitter-roberta-base-sentiment` for high-accuracy classification.
28
  * **Full-Stack Architecture:** Integrates a **FastAPI** backend for inference and a **Streamlit** frontend for visualization in a single container.
 
29
  * **CI/CD Automation:** Robust GitHub Actions pipeline for automated testing, building, and deployment to Hugging Face Spaces.
30
  * **Embedded Monitoring:** Basic logging system to track model predictions and sentiment distribution over time.
31
 
@@ -57,6 +58,7 @@ The project follows a rigorous MLOps pipeline to ensure reliability and speed of
57
 
58
  3. **CI/CD Pipeline (GitHub Actions):**
59
  * **Trigger:** Pushes to the `main` branch.
 
60
  * **Test:** Executes `pytest` suite to verify API endpoints (`/health`, `/analyze`) and model loading.
61
  * **Build:** Verifies Docker image creation.
62
  * **Deploy:** Automatically pushes the validated code to Hugging Face Spaces.
@@ -74,50 +76,52 @@ The project follows a rigorous MLOps pipeline to ensure reliability and speed of
74
  β”‚ β”œβ”€β”€ api/ # FastAPI endpoints (main.py)
75
  β”‚ β”œβ”€β”€ model/ # Model loader logic (RoBERTa)
76
  β”‚ └── services/ # Google News scraping logic
 
77
  β”œβ”€β”€ streamlit_app/ # Frontend Application Code (app.py)
78
- β”œβ”€β”€ src/ # Training simulation scripts
79
  β”œβ”€β”€ tests/ # Unit and integration tests (Pytest)
80
  β”œβ”€β”€ Dockerfile # Container configuration
81
  β”œβ”€β”€ entrypoint.sh # Startup script for dual-process execution
82
  β”œβ”€β”€ requirements.txt # Project dependencies
 
83
  └── README.md # Project documentation
84
 
 
85
  πŸ’» Installation & Usage
86
  To run this project locally using Docker (Recommended):
87
 
88
- 1. Clone the repository
89
- Bash
90
-
91
  git clone [https://github.com/YOUR_USERNAME/SentimentAnalysis.git](https://github.com/YOUR_USERNAME/SentimentAnalysis.git)
92
  cd SentimentAnalysis
93
- 2. Build the Docker Image
94
- Bash
95
 
 
 
96
  docker build -t reputation-monitor .
97
- 3. Run the Container
98
- Bash
99
 
 
 
100
  docker run -p 7860:7860 reputation-monitor
101
  Access the application at http://localhost:7860
102
 
103
- Manual Installation (No Docker)
104
  If you prefer running it directly with Python:
105
 
106
- Install dependencies:
107
 
108
- Bash
 
109
 
110
- pip install -r requirements.txt
111
- Start the Backend (FastAPI):
112
 
113
- Bash
 
114
 
115
- uvicorn app.api.main:app --host 0.0.0.0 --port 8000 --reload
116
- Start the Frontend (Streamlit) in a new terminal:
117
 
118
- Bash
 
119
 
120
- streamlit run streamlit_app/app.py
121
  ⚠️ Limitations & Future Roadmap
122
  Data Persistence: Currently, monitoring logs are stored in an ephemeral CSV file. In a production environment, this would be replaced by a persistent database (e.g., PostgreSQL) to ensure data retention across container restarts.
123
 
 
20
 
21
  **MachineInnovators Inc.** focuses on scalable, production-ready machine learning applications. This project is a comprehensive **MLOps solution** designed to monitor online company reputation through automated sentiment analysis of real-time news.
22
 
23
+ Unlike standard static notebooks, this repository demonstrates a **full-cycle ML workflow**. The system scrapes live data from **Google News**, analyzes sentiment using a **RoBERTa Transformer** model, and visualizes insights via an interactive dashboard, all orchestrated within a Dockerized environment.
24
 
25
  ### Key Features
26
  * **Real-Time Data Ingestion:** Automated scraping of Google News for target brand keywords.
27
  * **State-of-the-Art NLP:** Utilizes `twitter-roberta-base-sentiment` for high-accuracy classification.
28
  * **Full-Stack Architecture:** Integrates a **FastAPI** backend for inference and a **Streamlit** frontend for visualization in a single container.
29
+ * **Automated Continuous Training (CT):** Implements a pipeline logic that checks for new data and simulates model fine-tuning during CI/CD execution.
30
  * **CI/CD Automation:** Robust GitHub Actions pipeline for automated testing, building, and deployment to Hugging Face Spaces.
31
  * **Embedded Monitoring:** Basic logging system to track model predictions and sentiment distribution over time.
32
 
 
58
 
59
  3. **CI/CD Pipeline (GitHub Actions):**
60
  * **Trigger:** Pushes to the `main` branch.
61
+ * **Continuous Training:** Checks the `data/` directory for new labeled datasets. If found, initiates a training simulation to demonstrate the retraining lifecycle.
62
  * **Test:** Executes `pytest` suite to verify API endpoints (`/health`, `/analyze`) and model loading.
63
  * **Build:** Verifies Docker image creation.
64
  * **Deploy:** Automatically pushes the validated code to Hugging Face Spaces.
 
76
  β”‚ β”œβ”€β”€ api/ # FastAPI endpoints (main.py)
77
  β”‚ β”œβ”€β”€ model/ # Model loader logic (RoBERTa)
78
  β”‚ └── services/ # Google News scraping logic
79
+ β”œβ”€β”€ data/ # Dataset storage for retraining
80
  β”œβ”€β”€ streamlit_app/ # Frontend Application Code (app.py)
81
+ β”œβ”€β”€ src/ # Training scripts (Simulation)
82
  β”œβ”€β”€ tests/ # Unit and integration tests (Pytest)
83
  β”œβ”€β”€ Dockerfile # Container configuration
84
  β”œβ”€β”€ entrypoint.sh # Startup script for dual-process execution
85
  β”œβ”€β”€ requirements.txt # Project dependencies
86
+ β”œβ”€β”€ Appunti_Progetto.doc # Note and explanation of the project
87
  └── README.md # Project documentation
88
 
89
+
90
  πŸ’» Installation & Usage
91
  To run this project locally using Docker (Recommended):
92
 
93
+ ### 1. Clone the repository
94
+ ```bash
 
95
  git clone [https://github.com/YOUR_USERNAME/SentimentAnalysis.git](https://github.com/YOUR_USERNAME/SentimentAnalysis.git)
96
  cd SentimentAnalysis
 
 
97
 
98
+ ### 2. Build the Docker Image
99
+ ```bash
100
  docker build -t reputation-monitor .
 
 
101
 
102
+ ### 3. Run the Container
103
+ ```bash
104
  docker run -p 7860:7860 reputation-monitor
105
  Access the application at http://localhost:7860
106
 
107
+ Manual Installation (No Docker):
108
  If you prefer running it directly with Python:
109
 
110
+ 1. Install dependencies:
111
 
112
+ ```bash
113
+ pip install -r requirements.txt
114
 
115
+ 2. Start the Backend (FastAPI):
 
116
 
117
+ ```bash
118
+ uvicorn app.api.main:app --host 0.0.0.0 --port 8000 --reload
119
 
120
+ 3. Start the Frontend (Streamlit) in a new terminal:
 
121
 
122
+ ```bash
123
+ streamlit run streamlit_app/app.py
124
 
 
125
  ⚠️ Limitations & Future Roadmap
126
  Data Persistence: Currently, monitoring logs are stored in an ephemeral CSV file. In a production environment, this would be replaced by a persistent database (e.g., PostgreSQL) to ensure data retention across container restarts.
127