A crawler used to retrieve the information of GitHub users using GitHub official APIs.
Copyright (c) 2018 by Jiayun Zhang, Fudan University ([email protected])
We have tested our crawler on macOS High Sierra 10.13.1. Please make sure that you have installed Python 3.6.7 and Requests (pip install requests
).
-
Change directory to crawler
-
Manually fill in the authorization token in config.py
-
Run crawler using bash command:
python main.py TOTAL_USER OUTPUT_PATH
TOTAL_USER
is the total number of users you want to retrieveOUTPUT_PATH
is the path where you want to store the dataExample:
python main.py 10 data.txt
The crawler will collect the following information:
- The basic information of the user by User API (https://api.github.com/user/:id)
- The detailed follower list of the user by Follower API
- The detailed following list of the user by Following API
- The repository list of the user by Repository API
- The commit logs of the user by GitHub Search API
Each user entry is stored in JSON format.
See the LICENSE file for license rights and limitations (MIT).