-
Notifications
You must be signed in to change notification settings - Fork 0
lim-lq/mm_crawler
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
mm_crawler ========== 该爬虫功能是下载22mm.cc上的美女图片 命令行-h可以查看程序运行帮助,-n可以指定并发线程数(默认10个),-o可以指定图片存储在哪个目录(默认当前运行目录的pics目录下),-l可以限制爬多少图片就结束(默认不限制) 爬取流程: 1.以主页url为起始url放入category_url_queue 2.循环从category_url_queue中取出url打开,正则匹配查找包含/mm/的url 3.获得的包含/mm/的url有两种,一种是图片类别url,一种是一组图片的url,分别放入category_url_queue和detail_url_queue中 4.循环从detail_url_queue中取出url,并下载该组图片,每下载一张图片后判断是否达到上限,若有,则发出退出信号量 //run mm_crawler -l 5 -o /tmp/pics -l 200
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published