Skip to content

MatijaMax/parallel-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Parallel Scraper - Schneider Electric 2024

selo go reddit

This application is a simple scraper for Reddit, developed in Go. It allows users to get Reddit posts related to a specific topic and extracts the comments associated with those posts. The application performs sentiment analysis (positive or negative) on the comments using NLP (Natural Language Processing). It uses parallel scraping for faster data collection and operates through a console-based interface.

Technologies

  • Go (Golang): The programming language used to develop the application.
  • Colly: A popular Go library for parallel web scraping.
  • Sentiment: A Go library for Natural Language Processing, used for sentiment analysis (positive/negative) of the comments.
  • Console-based UI: An interactive console application without the need for a graphical user interface.

How the Application Works

  1. Reddit Search: The user enters a topic (e.g., "Trump win"), and the application searches Reddit for posts containing that topic.
  2. Scraping Posts and Comments: Using Colly, the application first scrapes Reddit search results for posts related to the topic, then visits each post to collect comments.
  3. Sentiment Analysis: The comments collected are processed using NLP to determine whether they are positive or negative.
  4. Displaying Results: The results of sentiment analysis are displayed directly in the console.

About

Parallel Scraper - Schneider Electric 2024

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages