Skip to content

pipluppers/TwitterCrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Twitter Crawler

Project provided by Prof. Vagelis Hristidis for CS 242 Information Retrieval at UCR.

Mainly used the tweepy module to access Twitter's APIs. Started with some popular twitter account and crawled their tweets along with those of their followers and the accounts they are following. Then recursively crawled through all of those accounts in order to collect a large mass of tweets in their JSON format.

The main rule was to only crawl tweets that were geo-tagged. In order to ensure this, a check was performed to check a user's geo_enabled field, and the program will only crawl the user's tweets if it was enabled.

Download the batch file and navigate the directory containing all of the files. Before running the file, modify keys.txt with the Twitter consumer key, consumer secret, access token, and access token secret on separate lines provided by creating an application through a Twitter developer account. In order to run the script and the crawler, do the following according the operating system type:

Windows: Type crawler

Linux/Unix: Type ./crawler.sh

Then input the screen name of the user from which to begin crawling from. Tweets will be written to the file output.txt in JSON format.

Used Modules: json, requests.packages.urllib3, sys, os, tweepy

About

Twitter Crawler

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published