PythonCrawer

self learn about crawerBook(python web scraping---Second Edition)
To get office code:
https://bitbucket.org/wswp/code/src/9e6b82b47087c2ada0e9fdf4f5e037e151975f0f/ or maybe seems like https://github.com/kjam/wswp (don't see this url in CN version)
English file:https://www.dlutkn.top/2018/11/14/Ebooks-about-ML-Python/pws.pdf
linkCrawer.py : chapter1
scraping.py : chapter2
DownloadCache.py : chapter3
Concurrency.py : chapter4redis db info like \xe1t--can only get chinese after decode in code
DynamicContent.py : chapter5
biliCrawer.py : 继续尝试爬取b站部分用户信息，多线程版
test.py : 单独功能测试用=。=
userAgents.txt : https://github.com/airingursb/bilibili-user 防止长时间爬取出现的网络问题（？）

qidian/books.csv book infos download from qidian.com bili/saved/biliinfo(Num).xls
爬取到的数据
TODO: 保存存储到excel中的行数继续尝试加快爬取(??)

data/countries_or_districts.csv infos from examle url

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.idea		.idea
__pycache__		__pycache__
bili/saved		bili/saved
cache/example.python-scraping.com		cache/example.python-scraping.com
data		data
qidian		qidian
.gitignore		.gitignore
Concurrency.py		Concurrency.py
DownloadCache.py		DownloadCache.py
DynamicContent.py		DynamicContent.py
README.md		README.md
biliCrawer.py		biliCrawer.py
linkCrawler.py		linkCrawler.py
requirements.txt		requirements.txt
scraping.py		scraping.py
test.py		test.py
userAgents.txt		userAgents.txt
用Python写网络爬虫.pptx		用Python写网络爬虫.pptx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PythonCrawer

About

Releases

Packages

Languages

luvSeohyun/PythonCrawer

Folders and files

Latest commit

History

Repository files navigation

PythonCrawer

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages