From 662f17ca7829ff45949920bae42e21f39d6dd79f Mon Sep 17 00:00:00 2001 From: eyaler Date: Thu, 27 Oct 2022 16:52:09 +0300 Subject: [PATCH] added huffman refs --- TODO.md | 2 +- ztml/huffman.py | 9 +++++---- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/TODO.md b/TODO.md index f33123b..e9d46ad 100644 --- a/TODO.md +++ b/TODO.md @@ -13,7 +13,7 @@ - Ablation benchmarks - Auto-caps should use modifiers for next letter/word/sentence/paragraph or block-level, over simple mode instead of falling back to raw - Dictionary compression for long texts -- [Fast Huffman one-shift decoder](https://researchgate.net/publication/3159499_On_the_implementation_of_minimum_redundancy_prefix_codes) +- [Fast Huffman one-shift decoder](https://researchgate.net/publication/3159499_On_the_implementation_of_minimum_redundancy_prefix_codes), or [follow-up](https://arxiv.org/pdf/1410.3438.pdf) [works](https://arxiv.org/pdf/2108.05495.pdf) - [Base139](https://github.com/kevinAlbs/Base122/issues/3#issuecomment-263787763) - Compress the JS itself and use eval, considering also JS packing e.g. [JSCrush](http://iteral.com/jscrush), [RegPack](https://siorki.github.io/regPack), [Roadroller](https://lifthrasiir.github.io/roadroller) - Benchmark [Roadroller](https://lifthrasiir.github.io/roadroller) entropy coding diff --git a/ztml/huffman.py b/ztml/huffman.py index 2667eea..7fc8e32 100644 --- a/ztml/huffman.py +++ b/ztml/huffman.py @@ -3,15 +3,16 @@ Even though we later compress with DEFLATE which does its own Huffman encoding internally, I found that for text compression, it is significantly beneficial to pre-encode with Huffman. Canonical encoding obviates saving or reconstructing an explicit codebook. -Instead, we save a string of symbols ordered by increasing frequency, -and a sparse dictionary from codeword lengths to bases and offsets -(see paper, but note it is my custom implementation). +Instead, we save a string of symbols and a sparse dictionary from codeword lengths to bases and offsets +(see Moffat paper, but note it is my custom implementation). A minimalistic JS decoder code is generated. References: https://wikipedia.org/wiki/Canonical_Huffman_code https://github.com/ilanschnell/bitarray/blob/master/doc/canonical.rst -https://researchgate.net/publication/3159499_On_the_implementation_of_minimum_redundancy_prefix_codes +https://researchgate.net/publication/3159499_On_the_implementation_of_minimum_redundancy_prefix_codes (Moffat) +https://arxiv.org/pdf/1410.3438.pdf +https://arxiv.org/pdf/2108.05495.pdf """