self learn about crawerBook(python web scraping---Second Edition)
To get office code:
https://bitbucket.org/wswp/code/src/9e6b82b47087c2ada0e9fdf4f5e037e151975f0f/
or maybe seems like https://github.com/kjam/wswp (don't see this url in CN version)
English file:https://www.dlutkn.top/2018/11/14/Ebooks-about-ML-Python/pws.pdf
linkCrawer.py : chapter1
scraping.py : chapter2
DownloadCache.py : chapter3
Concurrency.py : chapter4redis db info like \xe1t--can only get chinese after decode in code
DynamicContent.py : chapter5
biliCrawer.py : 继续尝试爬取b站部分用户信息,多线程版
test.py : 单独功能测试用=。=
userAgents.txt : https://github.com/airingursb/bilibili-user 防止长时间爬取出现的网络问题(?)
qidian/books.csv
book infos download from qidian.com
bili/saved/biliinfo(Num).xls
爬取到的数据
TODO:
保存存储到excel中的行数
继续尝试加快爬取(??)
data/countries_or_districts.csv
infos from examle url
-
Notifications
You must be signed in to change notification settings - Fork 0
luvSeohyun/PythonCrawer
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
self learn about crawerBook
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published