Skip to content

Latest commit

 

History

History
43 lines (35 loc) · 1.51 KB

extract-links.zh.md

File metadata and controls

43 lines (35 loc) · 1.51 KB

从网页 HTML 中,提取所有链接

[![reqwest-badge]][reqwest] [![select-badge]][select] [![cat-net-badge]][cat-net]

使用reqwest::get,去执行一个 HTTP GET 请求,然后使用Document::from_read将响应解析为 HTML 文档。拿find,配上Name标准“a”,检索所有链接。在Selection上,调用filter_map留下链接中 URL,包含“href”attr的。

# #[macro_use]
# extern crate error_chain;
extern crate reqwest;
extern crate select;

use select::document::Document;
use select::predicate::Name;
#
# error_chain! {
#    foreign_links {
#        ReqError(reqwest::Error);
#        IoError(std::io::Error);
#    }
# }

fn run() -> Result<()> {
    let res = reqwest::get("https://www.rust-lang.org/en-US/")?;

    Document::from_read(res)?
        .find(Name("a"))
        .filter_map(|n| n.attr("href"))
        .for_each(|x| println!("{}", x));

    Ok(())
}
#
# quick_main!(run);