Skip to content
forked from gocolly/colly

Elegant Scraper and Crawler Framework for Golang

License

Notifications You must be signed in to change notification settings

siberianbear/colly

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Colly

Lightning Fast and Elegant Scraping Framework for Gophers

Colly provides a clean interface to write any kind of crawler/scraper/spider.

With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.

Documentation

Features

  • Clean API
  • Fast (>1k request/sec on a single core)
  • Manages request delays and maximum concurrency per domain
  • Automatic cookie and session handling
  • Sync/async/parallel scraping
  • Caching

Example

func main() {
	c := colly.NewCollector()

	// Find and visit all links
	c.OnHTML("a", func(e *colly.HTMLElement) {
		link := e.Attr("href")
		fmt.Println(link)
		c.Visit(e.Request.AbsoluteURL(link))
	})

	c.Visit("https://en.wikipedia.org/")
}

See examples folder for more detailed examples.

Bugs

Bugs or suggestions? Visit the issue tracker or join #colly on freenode

About

Elegant Scraper and Crawler Framework for Golang

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Go 99.3%
  • HTML 0.7%