Reddit Posts Sentiment Analysis
The Reddit Posts Sentiment Analysis project aims to provide a tool for analyzing the sentiment of a post based on the comments . The project leverages natural language processing and machine learning techniques to determine whether the comments are positive, negative, or neutral in sentiment and assign the sentiment to the original post .
- Batch data collection: The system collects posts and comments in batchs based on best / new / tranding topics.
- Sentiment analysis: posts and comments are analyzed using a machine learning model to determine their sentiment.
- Interactive dashboard: Visualizes sentiment trends over time for different topics.
this project is devided into 3 main parts:
- Data part : Responsible for scrapping , processing , storing the raw data into a database.
- Model part : Responsible for ingesting data from database , processing and passing data into the machine learning model to get sentiment.
- Analyse part : Responsible for analyse results and data .
data ingestion : this module is reponsible for scrapping data from reddit using : PRAW library .
AWS Module :
Data Retreiver : Using AWS Kinesis firehose i can ingest data in batches into the S3 Bucket.
Data Storage : Using the data lake S3 , i can store all the raw data.
Data Processing :
Using AWS SNS for notifications when data is inputed into the S3 bucket.
Using AWS SQS for queue system.
Using AWS LAMBDA ( python ) for data processing.
Data storage : Using SUPABASE as SQL database to store the processsed data into 2 tables : POSTS , COMMENTS .
- using semi-supervised method , by public reddit texts data i can train a model to predict the sentiment .
- i will use this model to generate pseudo-labels for my data
- I adopt a multi-input neural network architecture.
- I begin by saving the knowledge from the pre-trained model's hidden layers and freezing them.
- I add features and columns that are unique to my dataset by constructing a new hidden layer dedicated to process those informations.
- To combine the insights from both text and the additional columns, I've added a special layer that effortlessly merges the knowledge from the pre-trained layers and the new columns layer.
configure -
data collection -
data processing -
data storage
data ingestion -
data transformation - model training , evaluation , tuning
- Deploy,Monitoring
- add unit tests
- add integration tests
- LinkedIn: []