Extract GitHub repositories metadata and README content.
STEPS:
- environment SETUP and package installation
```sh
python3 -m venv env
source env/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
# when finish using
deactivate
```
```sh
conda env create -f conda.yaml
conda activate crawler
# when finish using
conda deactivate
```
-
Update the
.env
file with the correct paramscp .env.example .env code .env
-
Run the following scripts:
i.
python crawl_repos.py <topic-name> <stars-size>
to crawl all the repos with the topic and stars greater or equal . If omitted will consider 0+ stars.ii.
python get_contributors.py
to crawl all the user who contributed the crawled repo from step 3.iiii.
python get_stargazers.py
to crawl all the users who starred the crawled repo from step 3.i