Skip to content

Machine learning pipeline for IMDB movie review sentiment analysis using Logistic Regression, FastAPI, and Docker. Track experiments with Neptune.ai and explore an interactive web interface. ๐Ÿš€

Notifications You must be signed in to change notification settings

himarygr/imdb-sentiment-classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

2 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

IMDB Sentiment Analysis Project

๐ŸŽฏ Project Overview

This project implements a machine learning pipeline to classify IMDB movie reviews into Positive or Negative sentiments. The pipeline includes data preprocessing, model training, evaluation, and a web-based interface for predictions. The project is fully containerized using Docker and uses Neptune.ai for experiment tracking and visualization.


Example of negative prediction

Confusion Matrix

Example of positive prediction

ROC-AUC Curve

๐Ÿš€ Features

  1. Data Preprocessing: Cleaning and vectorizing movie reviews using TF-IDF.
  2. Model Training: Logistic Regression with hyperparameter tuning using Random Search.
  3. Evaluation Metrics: Accuracy, F1-score, Confusion Matrix, and ROC-AUC curve.
  4. Neptune.ai Integration: Logs experiments, metrics, and visualizations.
  5. Web Interface: Simple frontend (HTML, CSS, JS) for users to input reviews and get predictions.
  6. Containerization: Backend and frontend are containerized with Docker and orchestrated using Docker Compose.

๐Ÿ› ๏ธ Technologies Used

  • Python 3.9: Main programming language.
  • FastAPI: Backend framework for serving predictions.
  • Scikit-learn: ML library for Logistic Regression and TF-IDF.
  • Neptune.ai: For experiment tracking.
  • Docker & Docker Compose: Containerization of the application.
  • HTML, CSS, JavaScript: Frontend interface.
  • Matplotlib & Seaborn: Visualization tools.
  • Pandas & NumPy: Data handling and processing.

๐Ÿ“‚ Project Structure

IMDB-Review-Classifier/
โ”‚
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ raw/                 # Raw dataset (IMDB Dataset.csv)
โ”‚   โ””โ”€โ”€ processed/           # Processed TF-IDF data and labels
โ”‚       โ”œโ”€โ”€ X_train_tfidf.npz
โ”‚       โ”œโ”€โ”€ X_test_tfidf.npz
โ”‚       โ”œโ”€โ”€ y_train.csv
โ”‚       โ”œโ”€โ”€ y_test.csv
โ”‚       โ””โ”€โ”€ tfidf_vectorizer.pkl
โ”‚
โ”œโ”€โ”€ model/
โ”‚   โ””โ”€โ”€ best_sentiment_model.pkl   # Trained Logistic Regression model
โ”‚
โ”œโ”€โ”€ backend/
โ”‚   โ”œโ”€โ”€ requirements.txt       # Python dependencies
โ”‚   โ”œโ”€โ”€ api.py                 # FastAPI backend for predictions
โ”‚   โ”œโ”€โ”€ train_model.py         # Model training with Random Search and Neptune logging
โ”‚   โ””โ”€โ”€ data_processing.py     # Data cleaning and TF-IDF processing
โ”‚
โ”œโ”€โ”€ frontend/
โ”‚   โ”œโ”€โ”€ index.html             # Web interface
โ”‚   โ”œโ”€โ”€ style.css              # Styling for the web interface
โ”‚   โ””โ”€โ”€ static/                # Static assets like images (snowflakes, icons)
โ”‚
โ”œโ”€โ”€ docker/
โ”‚   โ”œโ”€โ”€ Dockerfile.backend     # Dockerfile for the backend
โ”‚   โ”œโ”€โ”€ Dockerfile.frontend    # Dockerfile for the frontend
โ”‚   โ””โ”€โ”€ docker-compose.yml     # Docker Compose configuration
โ”‚
โ””โ”€โ”€ README.md                  # Project documentation

๐Ÿ“Š Dataset


โš™๏ธ Setup Instructions

1. Clone the repository

git clone https://github.com/himarygr/IMDB-Review-Classifier.git
cd IMDB-Review-Classifier

2. Install dependencies (for local development)

Backend

cd backend
pip install -r requirements.txt

Frontend

No installation is required for the static frontend.

3. Data Preprocessing

Run the following script to clean and vectorize data:

python backend/data_processing.py

4. Model Training

Run model training with Random Search and log metrics to Neptune.ai:

python backend/train_model.py

5. Run with Docker Compose

To build and run the project using Docker:

cd docker
docker-compose up --build
  • Backend will run on: http://localhost:8000
  • Frontend will run on: http://localhost:8501

๐Ÿ”— Endpoints (Backend API)

Method Endpoint Description
POST /predict/ Predict sentiment of a review

Example Request:

{
  "review": "The movie was absolutely fantastic! Great acting and direction."
}

Example Response:

{
  "sentiment": "positive"
}

๐Ÿ–ฅ๏ธ Web Interface

The frontend provides a simple interface where users can:

  1. Enter a movie review.
  2. Click the "Analyze Sentiment" button.
  3. See whether the review is classified as Positive ๐Ÿ˜Š or Negative ๐Ÿ˜ž.

๐Ÿงช Experiment Tracking

All experiments, metrics, and visualizations are logged to Neptune.ai.

Logged Items:

  1. Hyperparameters: C, solver, max_iter.
  2. Metrics: Accuracy, F1-score.
  3. Confusion Matrix: Uploaded as an image.
  4. ROC-AUC Curve: Uploaded as an image.
  5. CPU & Memory Usage: System resource monitoring.

๐ŸŽจ Visualizations in Neptune.ai

  • Confusion Matrix
  • ROC-AUC Curve
  • Accuracy and F1-Score
  • Hyperparameter values
  • CPU/Memory usage during training

Neptune dashboard

Neptune dashboard

๐Ÿ”ฎ Future Improvements

  • Add more classifiers (e.g., SVM, Random Forest) for comparison.
  • Integrate Grid Search for exhaustive hyperparameter tuning.
  • Deploy the project to a cloud service (AWS, GCP, etc.).
  • Enhance the frontend with a modern framework (React or Vue.js).

๐Ÿค Contributing

Feel free to fork the repository, create a branch, and submit pull requests for new features or bug fixes!


๐Ÿ“œ License

This project is licensed under the MIT License.


๐Ÿ“ž Contact

For any questions or suggestions:

About

Machine learning pipeline for IMDB movie review sentiment analysis using Logistic Regression, FastAPI, and Docker. Track experiments with Neptune.ai and explore an interactive web interface. ๐Ÿš€

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published