added huffman refs

eyaler · Oct 27, 2022 · 662f17c · 662f17c
1 parent 1f0dffa
commit 662f17c
Show file tree

Hide file tree

Showing 2 changed files with 6 additions and 5 deletions.
diff --git a/TODO.md b/TODO.md
@@ -13,7 +13,7 @@
 - Ablation benchmarks
 - Auto-caps should use modifiers for next letter/word/sentence/paragraph or block-level, over simple mode instead of falling back to raw
 - Dictionary compression for long texts
-- [Fast Huffman one-shift decoder](https://researchgate.net/publication/3159499_On_the_implementation_of_minimum_redundancy_prefix_codes)
+- [Fast Huffman one-shift decoder](https://researchgate.net/publication/3159499_On_the_implementation_of_minimum_redundancy_prefix_codes), or [follow-up](https://arxiv.org/pdf/1410.3438.pdf) [works](https://arxiv.org/pdf/2108.05495.pdf)
 - [Base139](https://github.com/kevinAlbs/Base122/issues/3#issuecomment-263787763)
 - Compress the JS itself and use eval, considering also JS packing e.g. [JSCrush](http://iteral.com/jscrush), [RegPack](https://siorki.github.io/regPack), [Roadroller](https://lifthrasiir.github.io/roadroller)
 - Benchmark [Roadroller](https://lifthrasiir.github.io/roadroller) entropy coding

diff --git a/ztml/huffman.py b/ztml/huffman.py
@@ -3,15 +3,16 @@
 Even though we later compress with DEFLATE which does its own Huffman encoding internally,
 I found that for text compression, it is significantly beneficial to pre-encode with Huffman.
 Canonical encoding obviates saving or reconstructing an explicit codebook.
-Instead, we save a string of symbols ordered by increasing frequency,
-and a sparse dictionary from codeword lengths to bases and offsets
-(see paper, but note it is my custom implementation).
+Instead, we save a string of symbols and a sparse dictionary from codeword lengths to bases and offsets
+(see Moffat paper, but note it is my custom implementation).
 A minimalistic JS decoder code is generated.
 
 References:
 https://wikipedia.org/wiki/Canonical_Huffman_code
 https://github.com/ilanschnell/bitarray/blob/master/doc/canonical.rst
-https://researchgate.net/publication/3159499_On_the_implementation_of_minimum_redundancy_prefix_codes
+https://researchgate.net/publication/3159499_On_the_implementation_of_minimum_redundancy_prefix_codes (Moffat)
+https://arxiv.org/pdf/1410.3438.pdf
+https://arxiv.org/pdf/2108.05495.pdf
 """