File tree Expand file tree Collapse file tree 2 files changed +23
-0
lines changed Expand file tree Collapse file tree 2 files changed +23
-0
lines changed Original file line number Diff line number Diff line change 17
17
##### 8. [ ECUT_pos_html.py] ( https://github.com/Fenghuapiao/PythonCrawler/blob/master/ECUT_pos_html.py ) : 抓取学校官网所有校园招聘信息,并保存为html格式,图片也会镶嵌在html中。
18
18
19
19
##### 9. [ ECUT_get_grade.py] ( https://github.com/Fenghuapiao/PythonCrawler/blob/master/ECUT_get_grade.py ) : 模拟登陆学校官网,抓取成绩并计算平均学分绩
20
+
21
+ ##### 10. [ githubHot.py] ( https://github.com/Fenghuapiao/PythonCrawler/blob/master/githubHot.py ) : 抓取github上面热门语言所对应的项目,并把项目简介和项目主页地址保存到本地文件。
Original file line number Diff line number Diff line change
1
+ import re
2
+ import requests
3
+ import pandas as pd
4
+ import numpy as np
5
+
6
+ def hot_github (keyword ):
7
+ url = 'https://github.com/trending/{0}' .format (keyword )
8
+ main_url = 'https://github.com{0}'
9
+ html = requests .get (url ).content .decode ('utf-8' )
10
+ reg_hot_url = re .compile ('<h3 class="repo-list-name">\s*<a href="(.*?)">' )
11
+ hot_url = [main_url .format (i ) for i in re .findall (reg_hot_url , html )]
12
+ url_abstract_reg = re .compile ('<p class="repo-list-description">\s*(.*?)\s*</p>' )
13
+ summary_text = re .findall (url_abstract_reg , html )
14
+ hotDF = pd .DataFrame ()
15
+ hotDF ['项目简介' ] = summary_text
16
+ hotDF ['项目地址' ] = hot_url
17
+ hotDF .to_csv ('./github_hot.csv' , index = False )
18
+
19
+ if __name__ == '__main__' :
20
+ keyword = input ('请输入查找的热门语言:' )
21
+ hot_github (keyword )
You can’t perform that action at this time.
0 commit comments