Skip to content

purplepapa/Storm-Crawl

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

storm-crawler

A collection of resources for building low-latency, large scale web crawlers on Storm available under Apache License.

Available from Maven Central with :

<dependency>
    <groupId>com.digitalpebble</groupId>
    <artifactId>storm-crawler</artifactId>
    <version>0.2</version>
</dependency>

Alternatively install Maven and do : mvn clean package to generate the full jar then with Storm installed run :

storm jar target/storm-crawler-0.3-SNAPSHOT-jar-with-dependencies.jar com.digitalpebble.storm.crawler.CrawlTopology -conf crawler-conf.yaml -local

Mailing list : http://groups.google.com/group/digitalpebble

About

Storm-based web crawler

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 100.0%