portia/slyd at master · dvdbng/portia

History

Name		Name	Last commit message	Last commit date
parent directory ..
app		app
bin		bin
config		config
dist		dist
public		public
slyd		slyd
splash_utils		splash_utils
tests		tests
twisted/plugins		twisted/plugins
vendor		vendor
.bowerrc		.bowerrc
.ember-cli		.ember-cli
.gitignore		.gitignore
.jshintrc		.jshintrc
Brocfile.js		Brocfile.js
Gulpfile.js		Gulpfile.js
README.md		README.md
bower.json		bower.json
index.js		index.js
karma.conf.js		karma.conf.js
package.json		package.json
requirements.txt		requirements.txt
setup.py		setup.py
slybot		slybot
testem.json		testem.json

README.md

How to try it:

The recommended way to install dependencies is to use virtualenv and then do:

pip install -r requirements.txt

Run the server using:

twistd -n slyd

and point your browser to: http://localhost:9001/static/index.html

Chrome and Firefox are supported, but it works better with chrome.

Slyd API Notes

This will be moved to separate docs - it's currently some notes for developers

All resources are either under /dist/ or /projects/.

project listing/creation/deletion/renaming

To get list all existing projects, just GET http://localhost:9001/projects:

$ curl http://localhost:9001/projects -> ["project1", "project2"]

New projects can be created by posting to /projects, for example:

$ curl -d '{"cmd": "create", "args": ["project_X"]}' http://localhost:9001/projects

To delete a project:

$ curl -d '{"cmd": "rm", "args": ["project_X"]}' http://localhost:9001/projects

To rename a project:

$ curl -d '{"cmd": "mv", "args": ["oldname", "newname"]}' http://localhost:9001/projects

Please note that projects will not be overwritten when renaming or creating new ones (if a project with the given name already exists an error from the 400 family will be returned).

spec

The project specification is available under /projects/PROJECT_ID/spec. The path format mirrors the slybot format documented here: http://slybot.readthedocs.org/en/latest/project.html

Currently, this is read only, but it will soon support PUT/POST.

The entire spec is returned for a GET request to the root:

$ curl http://localhost:9001/projects/78/spec
{"project": {
"version": "1308771278",
"name": "demo"
..
}

A list of available spiders can be retrieved:

$ curl http://localhost:9001/projects/78/spec/spiders ["accommodationforstudents.com", "food.com", "pinterest.com", "pin", "mhvillage"]

and specific resources can be requested:

$ curl http://localhost:9001/projects/78/spec/spiders/accommodationforstudents.com
{
	"templates":
...
    "respect_nofollow": true
}

The spec can be updating by POSTing:

$ curl --data @newlinkedin.js http://localhost:9001/projects/78/spec/spiders/linkedin

An HTTP 400 will be returned if the uploaded spec does not validate.

Basic commands are available for manipulating spider files. For example:

$ curl -d '{"cmd": "rm", "args": ["spidername"]}' http://localhost:9001/projects/78/spec/spiders

Available commands are:

mv - move spider from first arg to second. If the second exists it is overwritten.
rm - delete spider

bot/fetch

Accepts json object with the following fields:

request - same as scrapy requst object. At least needs a url
spider - spider name within in the project
page_id - unique ID for this page, must match the id used in templates (not yet implemented)

Returns a json object containing (so far):

page - page content, not yet annotated but will be
response - object containing the response data: http code and headers
items - array of items extracted
fp - request fingerprint
error - error message, present if there was an error
links - array of links followed

Coming soon in the response:

template_id - id of template that matched
trace - textual trace of the matching process - for debugging

If you want to work on an existing project, put it in data/projects/PROJECTID, these can be downloaded from dash or by:

$ bin/sh2sly data/projects -p 78 -k YOURAPIKEY

Then you can extract data:

$ curl -d '{"request": {"url": "http://www.pinterest.com/pin/339740365610932893/"}, "spider": "pinterest.com"}' http://localhost:9001/projects/78/bot/fetch { "fp": "0f2686acdc6a71eeddc49045b7cea0b6f81e6b61", "items": [ { "url": "http://www.pinterest.com/pin/339740365610932893/", "_template": "527387aa4d6c7133c6551481", "image": [ "http://media-cache-ak0.pinimg.com/736x/6c/c5/35/6cc5352046df0f8d8852cbdfb31542bb.jpg" ], "_type": "pin", "name": [ "Career Driven" ] } ], "page": "\n ...." }

Testing

slyd can be tested using twisted:

trial tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

slyd

slyd

README.md

How to try it:

Slyd API Notes

Testing

Files

slyd

Directory actions

More options

Directory actions

More options

Latest commit

History

slyd

Folders and files

parent directory

README.md

How to try it:

Slyd API Notes

Testing