Skip to content

Commit 8073a22

Browse files
committed
add githubHot.py
1 parent 53236d9 commit 8073a22

File tree

2 files changed

+23
-0
lines changed

2 files changed

+23
-0
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,3 +17,5 @@
1717
##### 8. [ECUT_pos_html.py](https://github.com/Fenghuapiao/PythonCrawler/blob/master/ECUT_pos_html.py): 抓取学校官网所有校园招聘信息,并保存为html格式,图片也会镶嵌在html中。
1818

1919
##### 9. [ECUT_get_grade.py](https://github.com/Fenghuapiao/PythonCrawler/blob/master/ECUT_get_grade.py): 模拟登陆学校官网,抓取成绩并计算平均学分绩
20+
21+
##### 10. [githubHot.py](https://github.com/Fenghuapiao/PythonCrawler/blob/master/githubHot.py): 抓取github上面热门语言所对应的项目,并把项目简介和项目主页地址保存到本地文件。

githubHot.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
import re
2+
import requests
3+
import pandas as pd
4+
import numpy as np
5+
6+
def hot_github(keyword):
7+
url = 'https://github.com/trending/{0}'.format(keyword)
8+
main_url = 'https://github.com{0}'
9+
html = requests.get(url).content.decode('utf-8')
10+
reg_hot_url = re.compile('<h3 class="repo-list-name">\s*<a href="(.*?)">')
11+
hot_url = [main_url.format(i) for i in re.findall(reg_hot_url, html)]
12+
url_abstract_reg = re.compile('<p class="repo-list-description">\s*(.*?)\s*</p>')
13+
summary_text = re.findall(url_abstract_reg, html)
14+
hotDF = pd.DataFrame()
15+
hotDF['项目简介'] = summary_text
16+
hotDF['项目地址'] = hot_url
17+
hotDF.to_csv('./github_hot.csv', index=False)
18+
19+
if __name__ == '__main__':
20+
keyword = input('请输入查找的热门语言:')
21+
hot_github(keyword)

0 commit comments

Comments
 (0)