GitHub - MaximKiselev/grab: Web Scraping Framework

Grab

https://readthedocs.org/projects/grab/badge/?version=latest

What is Grab?

Grab is a python web scraping framework. Grab provides tons of helpful methods to scrape web sites and to process the scraped content:

Automatic cookies (session) support
HTTP and SOCKS proxy with and without authorization
Keep-Alive support
IDN support
Tools to work with web forms
Easy multipart file uploading
Flexible customization of HTTP requests
Automatic charset detection
Powerful API of extracting info from HTML documents with XPATH queries
Asynchronous API to make thousands of simultaneous queries. This part of library called Spider and it is too big to even list its features in this README.
Python 3 ready

Grab Example

import logging

from grab import Grab

logging.basicConfig(level=logging.DEBUG)

g = Grab()

g.go('https://github.com/login')
g.doc.set_input('login', '****')
g.doc.set_input('password', '****')
g.doc.submit()

g.doc.save('/tmp/x.html')

g.doc('//ul[@id="user-links"]//button[contains(@class, "signout")]').assert_exists()

home_url = g.doc('//a[contains(@class, "header-nav-link name")]/@href').text()
repo_url = home_url + '?tab=repositories'

g.go(repo_url)

for elem in g.doc.select('//h3[@class="repo-list-name"]/a'):
    print('%s: %s' % (elem.text(),
                      g.make_url_absolute(elem.attr('href'))))

Grab::Spider Example

import logging

from grab.spider import Spider, Task

logging.basicConfig(level=logging.DEBUG)


class ExampleSpider(Spider):
    def task_generator(self):
        for lang in 'python', 'ruby', 'perl':
            url = 'https://www.google.com/search?q=%s' % lang
            yield Task('search', url=url, lang=lang)

    @staticmethod
    def task_search(grab, task):
        print('%s: %s' % (task.lang,
                          grab.doc('//div[@class="s"]//cite').text()))


bot = ExampleSpider(thread_number=2)
bot.run()

Installation

Pip is recommended way to install Grab and its dependencies:

$ pip install -U grab

See details here http://docs.grablib.org/en/latest/usage/installation.html

Documentation and Help

Documentation: http://docs.grablib.org/en/latest/

English mailing list: http://groups.google.com/group/grab-users/

Russian mailing list: http://groups.google.com/group/python-grab/

Contribution

To report a bug please use GitHub issue tracker: https://github.com/lorien/grab/issues

If you want to develop new feature in Grab please use issue tracker to describe what you want to do or contact me at [email protected]

Name	Name	Last commit message	Last commit date
Latest commit lorien Merge pull request lorien#178 from egorsmkv/master Apr 13, 2016 115f3a5 · Apr 13, 2016 History 1,648 Commits
docs	docs	Correcting errors in the documentation	Apr 12, 2016
grab	grab	Revert changes lorien#3	Apr 12, 2016
test	test	Merge pull request lorien#176 from SpikeVlg/origin/upstream	Apr 6, 2016
.bumpversion.cfg	.bumpversion.cfg	Bump version: 0.6.29 → 0.6.30	Nov 22, 2015
.gitignore	.gitignore	Enable IPv6	Jan 18, 2016
.travis.yml	.travis.yml	Update .travis.yml	Jan 23, 2016
LICENSE	LICENSE	Add MIT LICENSE file	Jun 12, 2013
Makefile	Makefile	Fix Makefile	Apr 9, 2015
README.rst	README.rst	Revert changes lorien#2	Apr 12, 2016
appveyor.yml	appveyor.yml	Fix appveyor file	May 25, 2015
requirements.txt	requirements.txt	Update requirements.txt	Sep 30, 2015
requirements_dev.txt	requirements_dev.txt	Fix some bugs	Sep 29, 2015
requirements_dev_backend.txt	requirements_dev_backend.txt	Improve test coverage	Mar 24, 2015
runtest.py	runtest.py	Refactored handling temporary files and directories in test cases	Nov 22, 2015
setup.py	setup.py	Convert double quotes to single quotes	Apr 12, 2016
test_settings.py	test_settings.py	Enable postgres tests in travis config	Mar 21, 2015
tox.ini	tox.ini	Update tox.ini	Sep 29, 2015
travis_settings.py	travis_settings.py	Enable postgres tests in travis config	Mar 21, 2015
win_pip_install.ps	win_pip_install.ps	Start integration with appveyor CI service	May 25, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Grab

What is Grab?

Grab Example

Grab::Spider Example

Installation

Documentation and Help

Contribution

About

Releases

Packages

Languages

License

MaximKiselev/grab

Folders and files

Latest commit

History

Repository files navigation

Grab

What is Grab?

Grab Example

Grab::Spider Example

Installation

Documentation and Help

Contribution

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages