Skip to content

icersummer/pyspider

This branch is 979 commits behind binux/pyspider:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

4ff873b · Nov 19, 2014
Mar 6, 2014
Nov 19, 2014
Nov 19, 2014
Nov 17, 2014
Mar 16, 2014
Nov 11, 2014
Oct 31, 2014
Nov 16, 2014
Nov 17, 2014
Nov 17, 2014
Nov 16, 2014
Nov 17, 2014
Nov 17, 2014

Repository files navigation

pyspider Build Status Coverage Status

A Powerful Spider System in Python. Try It Now!

  • Write script in python with powerful API
  • Powerful WebUI with script editor, task monitor, project manager and result viewer
  • MySQL, MongoDB, SQLite as database backend
  • Javascript pages supported!
  • Task priority, retry, periodical and recrawl by age or marks in index page (like update time)
  • Distributed architecture

Sample Code:

from libs.base_handler import *

class Handler(BaseHandler):
    '''
    this is a sample handler
    '''
    @every(minutes=24*60, seconds=0)
    def on_start(self):
        self.crawl('http://scrapy.org/', callback=self.index_page)

    @config(age=10*24*60*60)
    def index_page(self, response):
        for each in response.doc('a[href^="http://"]').items():
            self.crawl(each.attr.href, callback=self.detail_page)

    def detail_page(self, response):
        return {
                "url": response.url,
                "title": response.doc('title').text(),
                }

demo

Installation

if ubuntu: apt-get install python python-dev python-distribute python-pip libcurl4-openssl-dev libxml2-dev libxslt1-dev python-lxml

or Running with Docker

Documents

Contribute

License

Licensed under the Apache License, Version 2.0

About

A Powerful Spider System with Web UI

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 82.3%
  • JavaScript 10.8%
  • CSS 6.9%