Extract GitHub repositories metadata and README content.
STEPS:
-
environment SETUP and package installation
cp .env.example .env python3 -m venv env source env/bin/activate pip install --upgrade pip pip install -r requirements.txt
-
Update the
.env
file with the correct params -
Run the following scripts:
i.
python crawl_repos.py <topic-name> <stars-size>
to crawl all the repos with the topic and stars greater or equal . If omitted will consider 0+ stars.ii.
python get_contributors.py
to crawl all the user who contributed the crawled repo from step 3.iiii.
python get_stargazers.py
to crawl all the users who starred the crawled repo from step 3.i