hackage-download - Download all of Hackage
Script to download all of Hackage. Either --latest
versions or --all
historical versions.
./hackage-download.py --latest
./hackage-download.py --all
Both drop each package into the current directory.
./hackage-download.py --latest
finishes in 77 seconds on my desktop with Gigabit Internet.
- 13 seconds are
cabal list --simple
- 10 seconds is the downloading of 13981 latest
.tar.gz
packages (900 MB) - 54 seconds is the un-gzipping on my spinning disk (4.4 GB)
80x slower than my parallel wget
:
cabal list --simple | cut -d' ' -f1 | sort -u | parallel -j100 --load 10 cabal get {}
12535.05s user 566.04s system 276% cpu 1:19:03.51 total
This is with cabal 2.2.0.0
.
Downloading latest packages, with GNU parallel
to render a progress bar:
cabal list --simple | python3 -c 'from fileinput import *; [print("https://hackage.haskell.org/package/"+p+"/"+p+"-"+ver+".tar.gz") for (p,ver) in sorted(dict(map(str.split, input())).items())]' | time parallel --bar -P 100 -n 32 wget --quiet {}
- Some packages from
cabal list --simple
fail to download from Hackage, e.g. with410 Gone
or other errors. - Some
.tar.gz
packages set funny permissions on the unpacked files which makes removing them a bit of a nuisance. - Some
.tar.gz
packages have "trailing garbage", resulting in decrompression warnings. - As a result, the script must do lenient error handling, and will not report correctly on e.g. disk write errors.