hdq is a Go+ package for processing HTML document.
How to collect all links of a html page? If you use hdq
, it is very easy.
import "github.com/qiniu/hdq"
func links(url interface{}) []string {
doc := hdq.Source(url)
return [link for a <- doc.any.a, link := a.hrefVal?:""; link != ""]
}
At first, we call hdq.Source(url)
to create a node set
named doc
. doc
is a node set which only contains one node, the root node.
Then, select all a
elements by doc.any.a
. Here doc.any
means all nodes in the html document.
Then, we visit all these a
elements, get href
attribute value and assign it to the variable link
. If link is not empty, collect it.
At last, we return all collected links. Goto tutorial/01-Links to get the full source code.