Pull out bits of URLs provided on stdin
If you have Go installed and configured:
▶ go get -u github.com/tomnomnom/unfurl
Otherwise download the latest binary for your platform,
extract it and move it to somewhere in your $PATH
(e.g. /usr/bin/
):
▶ wget https://github.com/tomnomnom/unfurl/releases/download/v0.0.1/unfurl-linux-amd64-0.0.1.tgz
▶ tar xzf unfurl-linux-amd64-0.0.1.tgz
▶ sudo mv unfurl /usr/bin/
unfurl works with URLs provided on stdin; they might come from a file like this one:
▶ cat urls.txt
https://sub.example.com/users?id=123&name=Sam
https://sub.example.com/orgs?org=ExCo#about
http://example.net/about#contact
You can extract the domains from the URLs with the domains
mode:
▶ cat urls.txt | unfurl domains
sub.example.com
sub.example.com
example.net
If you don't want to output duplicate values you can use the -u
or --unique
flag:
▶ cat urls.txt | unfurl --unique domains
sub.example.com
example.net
The -u
/--unique
flag works for all modes.
▶ cat urls.txt | unfurl paths
/users
/orgs
/about
▶ cat urls.txt | unfurl keys
id
name
org
▶ cat urls.txt | unfurl values
123
Sam
ExCo
▶ cat urls.txt | unfurl keypairs
id=123
name=Sam
org=ExCo
You can use the format
mode to specify a custom output format:
▶ cat urls.txt | unfurl format %d%p
sub.example.com/users
sub.example.com/orgs
example.net/about
The available format directives are:
%% A literal percent character
%s The request scheme (e.g. https)
%u The user info (e.g. user:pass)
%d The domain (e.g. sub.example.com)
%S The subdomain (e.g. sub)
%r The root of domain (e.g. example)
%t The TLD (e.g. com)
%P The port (e.g. 8080)
%p The path (e.g. /users)
%q The raw query string (e.g. a=1&b=2)
%f The page fragment (e.g. page-section)
%@ Inserts an @ if user info is specified
%: Inserts a colon if a port is specified
%? Inserts a question mark if a query string exists
%# Inserts a hash if a fragment exists
%a Authority (alias for %u%@%d%:%P)
Any characters that don't match a format directive remain untouched:
▶ cat urls.txt | unfurl -u format "%d (%s)"
sub.example.com (https)
example.net (http)
Note that if a URL does not include the data requested, there will be no output for that URL:
▶ echo http://example.com | unfurl format "%P"
▶ echo http://example.com:8080 | unfurl format "%P"
8080
▶ unfurl -h
Format URLs provided on stdin
Usage:
unfurl [OPTIONS] [MODE] [FORMATSTRING]
Options:
-u, --unique Only output unique values
-v, --verbose Verbose mode (output URL parse errors)
Modes:
keys Keys from the query string (one per line)
values Values from the query string (one per line)
keypairs Key=value pairs from the query string (one per line)
domains The hostname (e.g. sub.example.com)
paths The request path (e.g. /users)
format Specify a custom format (see below)
Format Directives:
%% A literal percent character
%s The request scheme (e.g. https)
%u The user info (e.g. user:pass)
%d The domain (e.g. sub.example.com)
%P The port (e.g. 8080)
%p The path (e.g. /users)
%q The raw query string (e.g. a=1&b=2)
%f The page fragment (e.g. page-section)
%@ Inserts an @ if user info is specified
%: Inserts a colon if a port is specified
%? Inserts a question mark if a query string exists
%# Inserts a hash if a fragment exists
%a Authority (alias for %u%@%d%:%P)
Examples:
cat urls.txt | unfurl keys
cat urls.txt | unfurl format %s://%d%p?%q