Lightning Fast and Elegant Scraping Framework for Gophers
Colly provides a clean interface to write any kind of crawler/scraper/spider.
With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.
- Clean API
- Fast (>1k request/sec on a single core)
- Manages request delays and maximum concurrency per domain
- Automatic cookie and session handling
- Sync/async/parallel scraping
- Caching
- Automatic encoding of non-unicode responses
func main() {
c := colly.NewCollector()
// Find and visit all links
c.OnHTML("a", func(e *colly.HTMLElement) {
link := e.Attr("href")
fmt.Println(link)
c.Visit(e.Request.AbsoluteURL(link))
})
c.Visit("https://en.wikipedia.org/")
}
See examples folder for more detailed examples.
Bugs or suggestions? Visit the issue tracker or join #colly
on freenode
Below is a list of public, open source projects that use Colly:
- greenpeace/check-my-pages Scraping script to test the Spanish Greenpeace web archive
If you are using Colly in a project please send a pull request to add it to the list.