Skip to content

Nemoden/pyspider

This branch is 936 commits behind binux/pyspider:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

a4c34a2 · Nov 27, 2014
Mar 6, 2014
Nov 27, 2014
Nov 27, 2014
Nov 17, 2014
Mar 16, 2014
Nov 11, 2014
Oct 31, 2014
Nov 16, 2014
Nov 24, 2014
Nov 24, 2014
Nov 19, 2014
Nov 24, 2014
Nov 17, 2014
Nov 24, 2014

Repository files navigation

pyspider Build Status Coverage Status

A Powerful Spider(Web Crawler) System in Python. Try It Now!

  • Write script in python with powerful API
  • Powerful WebUI with script editor, task monitor, project manager and result viewer
  • MySQL, MongoDB, SQLite as database backend
  • Javascript pages supported!
  • Task priority, retry, periodical and recrawl by age or marks in index page (like update time)
  • Distributed architecture

Sample Code:

from libs.base_handler import *

class Handler(BaseHandler):
    '''
    this is a sample handler
    '''
    @every(minutes=24*60, seconds=0)
    def on_start(self):
        self.crawl('http://scrapy.org/', callback=self.index_page)

    @config(age=10*24*60*60)
    def index_page(self, response):
        for each in response.doc('a[href^="http://"]').items():
            self.crawl(each.attr.href, callback=self.detail_page)

    def detail_page(self, response):
        return {
                "url": response.url,
                "title": response.doc('title').text(),
                }

demo

Installation

  • python2.6/7 (windows is not supported currently)
  • pip install --allow-all-external -r requirements.txt
  • ./run.py , visit http://localhost:5000/

if ubuntu: apt-get install python python-dev python-distribute python-pip libcurl4-openssl-dev libxml2-dev libxslt1-dev python-lxml

Running with Docker

Documents

Contribute

License

Licensed under the Apache License, Version 2.0

About

A Powerful Spider System with Web UI

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 83.0%
  • JavaScript 10.5%
  • CSS 6.5%