This project demonstrates how to work with the Twitter API in python. Using the Tweepy library, you can scrap data from Twitter. The project also shows how to Extract, Transform and Load data into a CSV file and a MongoDB database.
Write a script that downloads tweets data on a specific search topic using the standard search API. The script should contain the following functions:
-
scrape_tweets() that has the following parameters:
- Search topic
- The number of tweets to download per request
- The number of requests
and returns a dataframe.
-
Save_results_as_csv() that has the following parameters:
- the dataframe from the above function
And returns a csv file with the following naming format:
tweets_downloaded_yymmdd_hhmmss.csv (where ‘yymmdd_hhmmss’ is the current timestamp)
- the dataframe from the above function
The following attributes of the tweets should be extracted:
- Tweet text
- Tweet id
- Source
- Coordinates
- Retweet count
- Likes count
- User info
- Username
- Screenname
- Location
- Friends count
- Verification status
- Description
- Followers count
Make sure to not include retweets.
Make sure you the same tweets appearing multiple times in your final csv.
Create a MongoDB database called Tweets_db and store the extracted tweets into a collection named: raw_tweets.
- Twitter Developer Account
Apply for a Twitter Developer account if you do not have one. You would need the credentials for working with the Twitter API. - Twitter API credentials
The project was developed using:
- Python 3.7.9
- Anaconda (conda)
- Tweepy
- Pymongo
- Pandas
Follow the steps below to setup the project.
Create a conda environment using the command:
conda create -n "env-name" python=3.7
Activate the environment using the command:
conda activate env-name
Install project packages using the command:
pip install -r requirements.txt
To store your access credentials (examples: API keys, Database access credentials), follow the steps below:
- Duplicate .env.example file and create a new file names .env
- Store your access credentials as needed