nscrape

Extract structured data from HTML.

Using the CLI (Non-Interactive)

$ npm install -g nscrape
write a spider (see below) for a particular website
$ nscrape --spider my-spider.json --wait 3000 --nr-of-pages 1
scraped items will be printed on stdout as JSON

Using gRPC (WIP)

TODO: include a persist-protobuf server implementation for interactive use
mv to different project ?

Spider

$ tee ./spider-reddit.json <<EOF
{
    "name": "reddit-js",
    "paged": true,
    "baseUrl": "https://www.reddit.com/r/javascript",
    "itemTypes": [{
        "name": "NewsItem",
        "selector": ".Post",
        "properties": {
            "title": "h3",
            "votes": {
              "xpath": "child::*[1]"
            }
        }
    }]
}
EOF

Name		Name	Last commit message	Last commit date
Latest commit History 154 Commits
schemas		schemas
src		src
.gitignore		.gitignore
.prettierrc.json		.prettierrc.json
README.md		README.md
example-reddit-spider.json		example-reddit-spider.json
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nscrape

Using the CLI (Non-Interactive)

Using gRPC (WIP)

Spider

About

Releases

Packages

Contributors 2

Languages

mrotaru/nscrape

Folders and files

Latest commit

History

Repository files navigation

nscrape

Using the CLI (Non-Interactive)

Using gRPC (WIP)

Spider

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages