Skip to content
/ dwarfs Public
forked from mhx/dwarfs

A fast high compression read-only file system for Linux, Windows and macOS

License

Notifications You must be signed in to change notification settings

5ky9uy/dwarfs

Repository files navigation

Build Status

DwarFS

A fast high compression read-only file system

Overview

DwarFS is a read-only file system with a focus on achieving very high compression ratios in particular for very redundant data.

This probably doesn't sound very exciting, because if it's redundant, it should compress well. However, I found that other read-only, compressed file systems don't do a very good job at making use of this redundancy. See here for a comparison with other compressed file systems.

DwarFS also doesn't compromise on speed and for my use cases I've found it to be on par with or perform better than SquashFS. For my primary use case, DwarFS compression is an order of magnitude better than SquashFS compression, it's 4 times faster to build the file system, it's typically faster to access files on DwarFS and it uses less CPU resources.

Distinct features of DwarFS are:

  • Clustering of files by similarity using a similarity hash function. This makes it easier to exploit the redundancy across file boundaries.

  • Segmentation analysis across file system blocks in order to reduce the size of the uncompressed file system. This saves memory when using the compressed file system and thus potentially allows for higher cache hit rates as more data can be kept in the cache.

  • Highly multi-threaded implementation. Both the file system creation tool as well as the FUSE driver are able to make good use of the many cores of your system.

  • Optional experimental Lua support to provide custom filtering and ordering function.

History

I started working on DwarFS in 2013 and my main use case and major motivation was that I had several hundred different versions of Perl that were taking up something around 30 gigabytes of disk space, and I was unwilling to spend more than 10% of my hard drive keeping them around for when I happened to need them.

Up until then, I had been using Cromfs for squeezing them into a manageable size. However, I was getting more and more annoyed by the time it took to build the filesystem image and, to make things worse, more often than not it was crashing after about an hour or so.

I had obviously also looked into SquashFS, but never got anywhere close to the compression rates of Cromfs.

This alone wouldn't have been enough to get me into writing DwarFS, but at around the same time, I was pretty obsessed with the recent developments and features of newer C++ standards and really wanted a C++ hobby project to work on. Also, I've wanted to do something with FUSE for quite some time. Last but not least, I had been thinking about the problem of compressed file systems for a bit and had some ideas that I definitely wanted to try.

The majority of the code was written in 2013, then I did a couple of cleanups, bugfixes and refactors every once in a while, but I never really got it to a state where I would feel happy releasing it. It was too awkward to build with its dependency on Facebook's (quite awesome) folly library and it didn't have any documentation.

Digging out the project again this year, things didn't look as grim as they used to. Folly now builds with CMake and so I just pulled it in as a submodule. Most other dependencies can be satisfied from packages that should be widely available. And I've written some rudimentary docs as well.

Building and Installing

Dependencies

DwarFS uses CMake as a build tool.

It uses both Boost and Folly, though the latter is included as a submodule since very few distributions actually offer packages for it. Folly itself has a number of dependencies, so please check here for an up-to-date list.

Other than that, DwarFS really only depends on FUSE3 and on a set of compression libraries that Folly already depends on (namely lz4, zstd and liblzma).

The dependency on googletest will be automatically resolved if you build with tests.

A good starting point for apt-based systems is probably:

# apt install \
    g++ \
    clang \
    cmake \
    make \
    pkg-config \
    binutils-dev \
    libboost-all-dev \
    libevent-dev \
    libdouble-conversion-dev \
    libgoogle-glog-dev \
    libgflags-dev \
    libiberty-dev \
    liblz4-dev \
    liblzma-dev \
    libzstd-dev \
    libsnappy-dev \
    libjemalloc-dev \
    libssl-dev \
    libunwind-dev \
    libfmt-dev \
    libfuse3-dev \
    libsparsehash-dev \
    zlib1g-dev

You can pick either clang or g++, but at least recent clang versions will produce substantially faster code:

$ hyperfine ./dwarfs_test-*
Benchmark #1: ./dwarfs_test-clang-O2
  Time (mean ± σ):      9.425 s ±  0.049 s    [User: 15.724 s, System: 0.773 s]
  Range (min … max):    9.373 s …  9.523 s    10 runs
 
Benchmark #2: ./dwarfs_test-clang-O3
  Time (mean ± σ):      9.328 s ±  0.045 s    [User: 15.593 s, System: 0.791 s]
  Range (min … max):    9.277 s …  9.418 s    10 runs
 
Benchmark #3: ./dwarfs_test-gcc-O2
  Time (mean ± σ):     13.798 s ±  0.035 s    [User: 20.161 s, System: 0.767 s]
  Range (min … max):   13.731 s … 13.852 s    10 runs
 
Benchmark #4: ./dwarfs_test-gcc-O3
  Time (mean ± σ):     13.223 s ±  0.034 s    [User: 19.576 s, System: 0.769 s]
  Range (min … max):   13.176 s … 13.278 s    10 runs
 
Summary
  './dwarfs_test-clang-O3' ran
    1.01 ± 0.01 times faster than './dwarfs_test-clang-O2'
    1.42 ± 0.01 times faster than './dwarfs_test-gcc-O3'
    1.48 ± 0.01 times faster than './dwarfs_test-gcc-O2'

$ hyperfine -L prog $(echo ./mkdwarfs-* | tr ' ' ,) '{prog} --no-progress --log-level warn -i tree -o /dev/null -C null'
Benchmark #1: ./mkdwarfs-clang-O2 --no-progress --log-level warn -i tree -o /dev/null -C null
  Time (mean ± σ):      4.358 s ±  0.033 s    [User: 6.364 s, System: 0.622 s]
  Range (min … max):    4.321 s …  4.408 s    10 runs
 
Benchmark #2: ./mkdwarfs-clang-O3 --no-progress --log-level warn -i tree -o /dev/null -C null
  Time (mean ± σ):      4.282 s ±  0.035 s    [User: 6.249 s, System: 0.623 s]
  Range (min … max):    4.244 s …  4.349 s    10 runs
 
Benchmark #3: ./mkdwarfs-gcc-O2 --no-progress --log-level warn -i tree -o /dev/null -C null
  Time (mean ± σ):      6.212 s ±  0.031 s    [User: 8.185 s, System: 0.638 s]
  Range (min … max):    6.159 s …  6.250 s    10 runs
 
Benchmark #4: ./mkdwarfs-gcc-O3 --no-progress --log-level warn -i tree -o /dev/null -C null
  Time (mean ± σ):      5.740 s ±  0.037 s    [User: 7.742 s, System: 0.645 s]
  Range (min … max):    5.685 s …  5.796 s    10 runs
 
Summary
  './mkdwarfs-clang-O3 --no-progress --log-level warn -i tree -o /dev/null -C null' ran
    1.02 ± 0.01 times faster than './mkdwarfs-clang-O2 --no-progress --log-level warn -i tree -o /dev/null -C null'
    1.34 ± 0.01 times faster than './mkdwarfs-gcc-O3 --no-progress --log-level warn -i tree -o /dev/null -C null'
    1.45 ± 0.01 times faster than './mkdwarfs-gcc-O2 --no-progress --log-level warn -i tree -o /dev/null -C null'

These measurements were made with gcc-9.3.0 and clang-10.0.1.

Building

Firstly, either clone the repository...

# git clone --recurse-submodules https://github.com/mhx/dwarfs
# cd dwarfs

...or unpack the release archive:

# tar xvf dwarfs-x.y.z.tar.bz2
# cd dwarfs-x.y.z

Once all dependencies have been installed, you can build DwarFS using:

# mkdir build
# cd build
# cmake .. -DWITH_TESTS=1
# make -j$(nproc)

If possible, try building with clang as your compiler, this will make DwarFS significantly faster. If you have both gcc and clang installed, use:

# cmake .. -DWITH_TESTS=1 -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++

To build with experimental Lua support, you need to install both lua and luabind. The latter isn't very well maintained and I hope to get rid of the dependency in the future. Add -DWITH_LUA=1 to the cmake command line to enable Lua support.

You can then run tests with:

# make test

Installing

Installing is as easy as:

# sudo make install

Though you don't have to install the tools to play with them.

Usage

Please check out the man pages for mkdwarfs and dwarfs. dwarfsck will be built and installed as well, but it's still work in progress.

Comparison

With SquashFS

These tests were done on an Intel(R) Xeon(R) CPU D-1528 @ 1.90GHz 6 core CPU with 64 GiB of RAM. The system was mostly idle during all of the tests.

The source directory contained 1139 different Perl installations from 284 distinct releases, a total of 47.65 GiB of data in 1,927,501 files and 330,733 directories. The source directory was freshly unpacked from a tar archive to a 850 EVO 1TB SSD, so most of its contents were likely cached.

I'm using the same compression type and compression level for SquashFS that is the default setting for DwarFS:

$ time mksquashfs install perl-install.squashfs -comp zstd -Xcompression-level 22
Parallel mksquashfs: Using 12 processors
Creating 4.0 filesystem on perl-install.squashfs, block size 131072.
[=====================================================================-] 2107401/2107401 100%

Exportable Squashfs 4.0 filesystem, zstd compressed, data block size 131072
        compressed data, compressed metadata, compressed fragments,
        compressed xattrs, compressed ids
        duplicates are removed
Filesystem size 4637597.63 Kbytes (4528.90 Mbytes)
        9.29% of uncompressed filesystem size (49922299.04 Kbytes)
Inode table size 19100802 bytes (18653.13 Kbytes)
        26.06% of uncompressed inode table size (73307702 bytes)
Directory table size 19128340 bytes (18680.02 Kbytes)
        46.28% of uncompressed directory table size (41335540 bytes)
Number of duplicate files found 1780387
Number of inodes 2255794
Number of files 1925061
Number of fragments 28713
Number of symbolic links  0
Number of device nodes 0
Number of fifo nodes 0
Number of socket nodes 0
Number of directories 330733
Number of ids (unique uids + gids) 2
Number of uids 1
        mhx (1000)
Number of gids 1
        users (100)

real    69m18.427s
user    817m15.199s
sys     1m38.237s

For DwarFS, I'm sticking to the defaults:

$ time ./mkdwarfs -i install -o perl-install.dwarfs
23:10:49.834964 scanning install
23:11:04.624000 waiting for background scanners...
23:12:41.876712 finding duplicate files...
23:12:53.441437 saved 28.2 GiB / 47.65 GiB in 1782826/1927501 duplicate files
23:12:53.441505 ordering 144675 inodes by similarity...
23:12:53.986472 144675 inodes ordered [544.9ms]
23:12:53.986562 numbering file inodes...
23:12:53.988970 building metadata...
23:12:53.989045 building blocks...
23:12:53.989118 saving links...
23:12:54.054908 saving names...
23:12:54.054999 compressing names table...
23:12:54.091963 names table: 111.4 KiB (9.979 KiB saved) [36.91ms]
23:12:54.092014 updating name offsets...
23:26:14.848604 saving chunks...
23:26:14.872847 saving chunk index...
23:26:14.873130 saving directories...
23:26:15.589713 saving inode index...
23:26:15.591260 saving metadata config...
23:27:13.313457 compressed 47.65 GiB to 529.4 MiB (ratio=0.0108502)
23:27:13.793889 filesystem created without errors [984s]
-------------------------------------------------------------------------------

scanned/found: 330733/330733 dirs, 0/0 links, 1927501/1927501 files
original size: 47.65 GiB, dedupe: 28.2 GiB (1782826 files), segment: 12.42 GiB
filesystem: 7.027 GiB in 450 blocks (390195 chunks, 144675/144675 inodes)
compressed filesystem: 450 blocks/529.4 MiB written
|=============================================================================|

real    16m24.108s
user    116m27.381s
sys     3m9.115s

So in this comparison, mkdwarfs is more than 4 times faster than mksquashfs. In total CPU time, it's actually 7 times less CPU resources.

$ ls -l perl-install.*fs
-rw-r--r-- 1 mhx users  555118147 Nov 24 23:27 perl-install.dwarfs
-rw-r--r-- 1 mhx users 4748902400 Nov 25 00:37 perl-install.squashfs

In terms of compression ratio, the DwarFS file system is more than 8 times smaller than the SquashFS file system. With DwarFS, the content has been compressed down to 1.1% (!) of its original size.

DwarFS also features an option to recompress an existing file system with a different compression algorithm. This can be useful as it allows relatively fast experimentation with different algorithms and options without requiring a full rebuild of the file system. For example, recompressing the above file system with the best possible compression (lzma:level=9:extreme):

$ time mkdwarfs --recompress -i perl-install.dwarfs -o perl-lzma.dwarfs -C lzma:level=9:extreme
00:59:17.706321 filesystem rewritten [676.5s]
-------------------------------------------------------------------------------

scanned/found: 0/0 dirs, 0/0 links, 0/0 files
original size: 47.65 GiB, dedupe: 0 B (0 files), segment: 0 B
filesystem: 7.027 GiB in 450 blocks (0 chunks, 0/0 inodes)
compressed filesystem: 450 blocks/456.7 MiB written
|====================================================                         |

real    11m16.672s
user    121m44.054s
sys     1m45.250s

$ ls -l perl-*.dwarfs
-rw-r--r-- 1 mhx users 555118147 Nov 24 23:27 perl-install.dwarfs
-rw-r--r-- 1 mhx users 478893736 Nov 25 00:59 perl-lzma.dwarfs

This reduces the file system size by another 15%, pushing the total compression ratio below 1%.

In terms of how fast the file system is when using it, a quick test I've done is to freshly mount the filesystem created above and run each of the 1139 perl executables to print their version.

$ hyperfine -c "umount mnt" -p "umount mnt; ./dwarfs perl-install.dwarfs mnt -o cachesize=1g -o workers=4; sleep 1" -P procs 5 20 -D 5 "ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P{procs} sh -c '\$0 -v >/dev/null'"
Benchmark #1: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P5 sh -c '$0 -v >/dev/null'
  Time (mean ± σ):      4.092 s ±  0.031 s    [User: 2.183 s, System: 4.355 s]
  Range (min … max):    4.022 s …  4.122 s    10 runs
 
Benchmark #2: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P10 sh -c '$0 -v >/dev/null'
  Time (mean ± σ):      2.698 s ±  0.027 s    [User: 1.979 s, System: 3.977 s]
  Range (min … max):    2.657 s …  2.732 s    10 runs
 
Benchmark #3: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P15 sh -c '$0 -v >/dev/null'
  Time (mean ± σ):      2.341 s ±  0.029 s    [User: 1.883 s, System: 3.794 s]
  Range (min … max):    2.303 s …  2.397 s    10 runs
 
Benchmark #4: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '$0 -v >/dev/null'
  Time (mean ± σ):      2.207 s ±  0.037 s    [User: 1.818 s, System: 3.673 s]
  Range (min … max):    2.163 s …  2.278 s    10 runs

These timings are for initial runs on a freshly mounted file system, running 5, 10, 15 and 20 processes in parallel. 2.2 seconds means that it takes only about 2 milliseconds per Perl binary.

Following are timings for subsequent runs, both on DwarFS (at mnt) and the original EXT4 (at install). DwarFS is around 15% slower here:

$ hyperfine -P procs 10 20 -D 10 -w1 "ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P{procs} sh -c '\$0 -v >/dev/null'" "ls -1 install/*/*/bin/perl5* | xargs -d $'\n' -n1 -P{procs} sh -c '\$0 -v >/dev/null'"
Benchmark #1: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P10 sh -c '$0 -v >/dev/null'
  Time (mean ± σ):     655.8 ms ±   5.5 ms    [User: 1.716 s, System: 2.784 s]
  Range (min … max):   647.6 ms … 664.3 ms    10 runs
 
Benchmark #2: ls -1 install/*/*/bin/perl5* | xargs -d $'\n' -n1 -P10 sh -c '$0 -v >/dev/null'
  Time (mean ± σ):     583.9 ms ±   5.0 ms    [User: 1.715 s, System: 2.773 s]
  Range (min … max):   577.0 ms … 592.0 ms    10 runs
 
Benchmark #3: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '$0 -v >/dev/null'
  Time (mean ± σ):     638.2 ms ±  10.7 ms    [User: 1.667 s, System: 2.736 s]
  Range (min … max):   629.1 ms … 658.4 ms    10 runs
 
Benchmark #4: ls -1 install/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '$0 -v >/dev/null'
  Time (mean ± σ):     567.0 ms ±   3.2 ms    [User: 1.684 s, System: 2.719 s]
  Range (min … max):   561.5 ms … 570.5 ms    10 runs

Using the lzma-compressed file system, the metrics for initial runs look considerably worse:

$ hyperfine -c "umount mnt" -p "umount mnt; ./dwarfs perl-lzma.dwarfs mnt -o cachesize=1g -o workers=4; sleep 1" -P procs 5 20 -D 5 "ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P{procs} sh -c '\$0 -v >/dev/null'"
Benchmark #1: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P5 sh -c '$0 -v >/dev/null'
  Time (mean ± σ):     20.372 s ±  0.135 s    [User: 2.338 s, System: 4.511 s]
  Range (min … max):   20.208 s … 20.601 s    10 runs
 
Benchmark #2: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P10 sh -c '$0 -v >/dev/null'
  Time (mean ± σ):     13.015 s ±  0.094 s    [User: 2.148 s, System: 4.120 s]
  Range (min … max):   12.863 s … 13.144 s    10 runs
 
Benchmark #3: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P15 sh -c '$0 -v >/dev/null'
  Time (mean ± σ):     11.533 s ±  0.058 s    [User: 2.013 s, System: 3.970 s]
  Range (min … max):   11.469 s … 11.649 s    10 runs
 
Benchmark #4: ls -1 mnt/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '$0 -v >/dev/null'
  Time (mean ± σ):     11.402 s ±  0.095 s    [User: 1.906 s, System: 3.787 s]
  Range (min … max):   11.297 s … 11.568 s    10 runs

So you might want to consider using zstd instead of lzma if you'd like to optimize for file system performance. It's also the default compression used by mkdwarfs.

On a different system, Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, with 4 cores, I did more tests with both SquashFS and DwarFS (just because on the 6 core box my kernel didn't have support for zstd in SquashFS):

hyperfine -c 'sudo umount /tmp/perl/install' -p 'umount /tmp/perl/install; ./dwarfs perl-install.dwarfs /tmp/perl/install -o cachesize=1g -o workers=4; sleep 1' -n dwarfs-zstd "ls -1 /tmp/perl/install/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '\$0 -v >/dev/null'" -p 'sudo umount /tmp/perl/install; sudo mount -t squashfs perl-install.squashfs /tmp/perl/install; sleep 1' -n squashfs-zstd "ls -1 /tmp/perl/install/*/*/bin/perl5* | xargs -d $'\n' -n1 -P20 sh -c '\$0 -v >/dev/null'"
Benchmark #1: dwarfs-zstd
  Time (mean ± σ):      2.071 s ±  0.372 s    [User: 1.727 s, System: 2.866 s]
  Range (min … max):    1.711 s …  2.532 s    10 runs
 
Benchmark #2: squashfs-zstd
  Time (mean ± σ):      3.668 s ±  0.070 s    [User: 2.173 s, System: 21.287 s]
  Range (min … max):    3.616 s …  3.846 s    10 runs
 
Summary
  'dwarfs-zstd' ran
    1.77 ± 0.32 times faster than 'squashfs-zstd'

So DwarFS is almost twice as fast as SquashFS. But what's more, SquashFS also uses significantly more CPU power. However, the numbers shown above for DwarFS obviously don't include the time spent in the dwarfs process, so I repeated the test outside of hyperfine:

$ time ./dwarfs perl-install.dwarfs /tmp/perl/install -o cachesize=1g -o workers=4 -f

real    0m8.463s
user    0m3.821s
sys     0m2.117s

So in total, DwarFS was using 10.5 seconds of CPU time, whereas SquashFS was using 23.5 seconds, more than twice as much. Ignore the 'real' time, this is only how long it took me to unmount the file system again after mounting it.

Another real-life test was to build and test a Perl module with 624 different Perl versions in the compressed file system. The module I've used, Tie::Hash::Indexed, has an XS component that requires a C compiler to build. So this really accesses a lot of different stuff in the file system:

  • The perl executables and its shared libraries

  • The Perl modules used for writing the Makefile

  • Perl's C header files used for building the module

  • More Perl modules used for running the tests

I wrote a little script to be able to run multiple builds in parallel:

#!/bin/bash
set -eu
perl=$1
dir=$(echo "$perl" | cut -d/ --output-delimiter=- -f5,6)
rsync -a Tie-Hash-Indexed-0.08/ $dir/
cd $dir
$1 Makefile.PL >/dev/null 2>&1
make test >/dev/null 2>&1
cd ..
rm -rf $dir
echo $perl

The following command will run up to 8 builds in parallel on the 4 core i7 CPU, including debug, optimized and threaded versions of all Perl releases between 5.10.0 and 5.33.3, a total of 624 perl installations:

$ time ls -1 /tmp/perl/install/*/perl-5.??.?/bin/perl5* | sort -t / -k 8 | xargs -d $'\n' -P 8 -n 1 ./build.sh

Tests were done with a cleanly mounted file system to make sure the caches were empty. ccache was primed to make sure all compiler runs could be satisfied from the cache. With SquashFS, the timing was:

real    3m17.182s
user    20m54.064s
sys     4m16.907s

And with DwarFS:

real    3m14.402s
user    19m42.984s
sys     2m49.292s

So, frankly, not much of a difference. The dwarfs process itself used:

real    4m23.151s
user    0m25.036s
sys     0m35.216s

So again, DwarFS used less raw CPU power, but in terms of wallclock time, the difference is really marginal.

About

A fast high compression read-only file system for Linux, Windows and macOS

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 92.3%
  • CMake 3.6%
  • C 2.3%
  • Shell 0.7%
  • Thrift 0.6%
  • Python 0.5%