- Set up venv and install requirements.
- Download a copy of your Twitter archive.
- Load the tweets in the archive:
./tbeat.py tweet.js my-tweets
. - Copy
tokens.example.json
totokens.json
and fill in your API details. - Use cron / systemd timer to periodically run
./tbeat.py api my-tweets
and keep your tweets updated.
Twitter API has strict API rate limits. It is strongly recommended that you download a copy of your existing tweets and load them into Elasticsearch as mentioned above in steps 2 and 3, instead of fetching all your tweets from the API. With your existing tweets loaded into Elasticsearch, the script will fetch tweets that are newer than the last tweet in the database. On hitting rate limits, the script will pause for 15 min and retry.
A tweet
object in Twitter Archive (data/tweet.js
) used to identical to its counterpart returned by Twitter API. However, some time between late 2017 and early 2020, the Archive version diverged from its API counterpart. The Archive version lacks a few dict keys, namely:
- The
user
dict, which contains information about the tweet author, likeuser.id
anduser.screen_name
. - The
retweeted
bool andretweeted_status
dict. In the API version, theretweeted_status
embeds the original tweet in the form of anothertweet
object. However, in the archive version, theretweeted
bool is alwaysfalse
.
If you happend to have an archive file that has a fuller data structure, consider ingesting it first before ingesting archive files downloaded later.