Skip to content

luvSeohyun/PythonCrawer

Repository files navigation

PythonCrawer

self learn about crawerBook(python web scraping---Second Edition)
To get office code:
https://bitbucket.org/wswp/code/src/9e6b82b47087c2ada0e9fdf4f5e037e151975f0f/ or maybe seems like https://github.com/kjam/wswp (don't see this url in CN version)
English file:https://www.dlutkn.top/2018/11/14/Ebooks-about-ML-Python/pws.pdf
linkCrawer.py : chapter1
scraping.py : chapter2
DownloadCache.py : chapter3
Concurrency.py : chapter4redis db info like \xe1t--can only get chinese after decode in code
DynamicContent.py : chapter5
biliCrawer.py : 继续尝试爬取b站部分用户信息,多线程版
test.py : 单独功能测试用=。=
userAgents.txt : https://github.com/airingursb/bilibili-user 防止长时间爬取出现的网络问题(?)


qidian/books.csv book infos download from qidian.com bili/saved/biliinfo(Num).xls
爬取到的数据
TODO: 保存存储到excel中的行数 继续尝试加快爬取(??)


data/countries_or_districts.csv infos from examle url

About

self learn about crawerBook

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages