Skip to content
This repository has been archived by the owner on Mar 9, 2022. It is now read-only.

Use better compression algorithm than 'deflate' #6

Open
snej opened this issue Jan 11, 2018 · 3 comments
Open

Use better compression algorithm than 'deflate' #6

snej opened this issue Jan 11, 2018 · 3 comments

Comments

@snej
Copy link
Contributor

snej commented Jan 11, 2018

Alternate, newer compression algorithms we can consider:

  • Zstandard: "providing high compression ratios … a very wide range of compression / speed trade-off, while being backed by a very fast decoder"
  • lz4: "compression speed at 400 MB/s per core … extremely fast decoder, with speed in multiple GB/s per core"
  • Brotli: "compression ratio comparable to the best currently available general-purpose compression methods … similar in speed with deflate but offers more dense compression"
  • lzfse: "compresses with a ratio comparable to that of zlib (DEFLATE) and decompresses 2–3x faster while using fewer resources"
  • Snappy: "aims for very high speeds and reasonable compression … an order of magnitude faster [than zlib] for most inputs, but the resulting compressed files are anywhere from 20% to 100% bigger"

Of these, Zstandard seems the most attractive. According to the comparison table on its home page, its speed is 3x that of zlib — almost as good as Snappy — while the compression ratio is even better than zlib. And a Go wrapper package is available.

lz4 is the speed champion but compresses about the same as Snappy (i.e. not great.)

@daverigby
Copy link

KV-Engine did some comparison of Snappy, LZ4 and zstd recently for per-document compression (on mobile so don’t have the doc link to hand) but we found much lower compression ratios than the published numbers when compressing “typical” documents - I.e JSON of the order of 1K in size.

Much of the compression comes from having a good sized corpus, which isn’t the case when compressing just one doc at a time. We decided to stick with Snappy (as it’s already used in parts of the stack / more easily available), but LZ4 was the winner otherwise - zstd didn’t compress that much more than LZ4 and was a lot more expensive.

@snej
Copy link
Contributor Author

snej commented Jan 11, 2018

Much of the compression comes from having a good sized corpus, which isn’t the case when compressing just one doc at a time.

BLIP is now using inter-message compression — it uses a single compression context for the entire stream and runs each frame through it — so we do get [almost] the same compression as if it were all one big document.

(I say "[almost]" because we do have to flush the compressor at the end of every message, and that adds some overhead; it looks like zlib has to rewrite some Huffman encoding state at the start of every block.)

@daverigby
Copy link

Cool; that'll certainly give different considerations.

BTW, I thought this was a really nice way of looking at the tradeoffs between the different algorithms: http://fastcompression.blogspot.co.uk/p/compression-benchmark.html

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants