Skip to content

cfq20/portia

This branch is 3 commits behind scrapinghub/portia:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

11e10d5 · Oct 17, 2018
Mar 22, 2018
Feb 27, 2017
May 16, 2018
May 16, 2018
Jun 5, 2018
May 16, 2018
May 19, 2018
Apr 19, 2018
Mar 8, 2017
Mar 27, 2018
Mar 6, 2017
Feb 19, 2015
Oct 15, 2015
Mar 3, 2017
Feb 19, 2015
May 16, 2018
Apr 20, 2017
May 16, 2018
Mar 25, 2014
Oct 17, 2018
Apr 20, 2017
May 16, 2018
May 16, 2018

Repository files navigation

Portia

Portia is a tool that allows you to visually scrape websites without any programming knowledge required. With Portia you can annotate a web page to identify the data you wish to extract, and Portia will understand based on these annotations how to scrape data from similar pages.

Running Portia

The easiest way to run Portia is using Docker:

You can run Portia using Docker & official Portia-image by running:

docker run -v ~/portia_projects:/app/data/projects:rw -p 9001:9001 scrapinghub/portia

You can also set up a local instance with Docker-compose by cloning this repo & running from the root of the folder:

docker-compose up

For more detailed instructions, and alternatives to using Docker, see the Installation docs.

Documentation

Documentation can be found from Read the docs. Source files can be found in the docs directory.

About

Visual scraping for Scrapy

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 40.1%
  • HTML 39.7%
  • JavaScript 17.7%
  • CSS 1.7%
  • Shell 0.4%
  • Makefile 0.2%
  • Other 0.2%