Skip to content

Commit 3ac200e

Browse files
author
Jordi Montes
committed
HyperBitBit class documented.
1 parent 8ff6cdb commit 3ac200e

File tree

1 file changed

+34
-0
lines changed

1 file changed

+34
-0
lines changed

src/main/java/com/clearspring/analytics/stream/cardinality/HyperBitBit.java

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,12 +20,46 @@
2020

2121
import java.io.IOException;
2222

23+
/**
24+
* Java implementation of HyperBitBit (HBB) algorithm as seen on the presentation
25+
* by Robert Sedgewick:
26+
* <p/>
27+
* https://www.cs.princeton.edu/~rs/talks/AC11-Cardinality.pdf
28+
* <p/>
29+
* HBB aims to beat HyperLogLog.
30+
* From the talk, on practical data:
31+
* - HyperBitBit, for N < 2^64,
32+
* - Uses 128 + 6 bits. (in this implementation case 128 + 8)
33+
* - Estimates cardinality within 10% of the actual.
34+
* <p/>
35+
* The algorithm still need some improvements.
36+
* - If you insert twice the same element the structure can change (not as in HLL)
37+
* - For small cardinalities it does not work AT ALL.
38+
* - The constatn 5.4 used in the cardinality estimation formula should be refined
39+
* with real world applications feedback
40+
* <p/>
41+
* Even so, HyperBitBit has the necessary characteristics to become
42+
* a better algorithm than HyperLogLog:
43+
* - Makes one pass through the stream.
44+
* - Uses a few dozen machine instructions per value
45+
* - Uses a few hundred bits
46+
* - Achieves 10% relative accuracy or better
47+
* <p/>
48+
* Any feedback to improve the algorithm in its weak points will be welcome.
49+
* <p/>
50+
*/
51+
2352
public class HyperBitBit implements ICardinality {
2453

2554
int lgN;
2655
long sketch;
2756
long sketch2;
2857

58+
/**
59+
* Create a new HyperBitBit instance.
60+
*
61+
* Remember that it does not work well for small cardinalities!
62+
*/
2963
public HyperBitBit() {
3064
lgN = 5;
3165
sketch = 0;

0 commit comments

Comments
 (0)