Name		Name	Last commit message	Last commit date
parent directory ..
search		search
spider		spider
.gitkeep		.gitkeep
README.md		README.md

README.md

1. Vue数据展示项目

1.1 直接http构造es查询，显示查询结果，提供web端查看

1.2 前端拼接hivesql，查询hive表数据

2. 爬虫系统

2.1 爬取数据后，走rabbitmq消息队列通信，数据文件爬取后上传到sftp，然后跑mapreduce任务创建hive表，上传到hdfs

2.2 定时调度爬虫系统

3. data-spider基本架构图

https://my-macro-oss.oss-cn-shenzhen.aliyuncs.com/mall/images/20200304/data-spider.png

4. 启动脚本

django搜索服务
source /usr/local/python-3.6.2/envs/scrapytest/bin/activate
cd /usr/local/scrapy/search
python3 manage.py runserver 0.0.0.0:8000

#启动scrapy后台服务
cd /usr/local/scrapy/spider
/usr/local/python-3.6.2/envs/scrapytest/bin/scrapyd &

#查看scrapyd
netstat -tlnp | grep 6800

#部署spider到scrapy
/usr/local/python-3.6.2/envs/scrapytest/bin/scrapyd-deploy Myploy -p ArticleSpider

#启动爬虫
curl http://120.79.159.59:6800/schedule.json -d project=ArticleSpider -d spider=zhihu
curl http://120.79.159.59:6800/schedule.json -d project=ArticleSpider -d spider=lagou
curl http://120.79.159.59:6800/schedule.json -d project=ArticleSpider -d spider=jobbole

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

crawlerData

crawlerData

README.md

1. Vue数据展示项目

2. 爬虫系统

3. data-spider基本架构图

4. 启动脚本

Files

crawlerData

Directory actions

More options

Directory actions

More options

Latest commit

History

crawlerData

Folders and files

parent directory

README.md

1. Vue数据展示项目

2. 爬虫系统

3. data-spider基本架构图

4. 启动脚本