|
20 | 20 |
|
21 | 21 | import java.io.IOException;
|
22 | 22 |
|
| 23 | +/** |
| 24 | + * Java implementation of HyperBitBit (HBB) algorithm as seen on the presentation |
| 25 | + * by Robert Sedgewick: |
| 26 | + * <p/> |
| 27 | + * https://www.cs.princeton.edu/~rs/talks/AC11-Cardinality.pdf |
| 28 | + * <p/> |
| 29 | + * HBB aims to beat HyperLogLog. |
| 30 | + * From the talk, on practical data: |
| 31 | + * - HyperBitBit, for N < 2^64, |
| 32 | + * - Uses 128 + 6 bits. (in this implementation case 128 + 8) |
| 33 | + * - Estimates cardinality within 10% of the actual. |
| 34 | + * <p/> |
| 35 | + * The algorithm still need some improvements. |
| 36 | + * - If you insert twice the same element the structure can change (not as in HLL) |
| 37 | + * - For small cardinalities it does not work AT ALL. |
| 38 | + * - The constatn 5.4 used in the cardinality estimation formula should be refined |
| 39 | + * with real world applications feedback |
| 40 | + * <p/> |
| 41 | + * Even so, HyperBitBit has the necessary characteristics to become |
| 42 | + * a better algorithm than HyperLogLog: |
| 43 | + * - Makes one pass through the stream. |
| 44 | + * - Uses a few dozen machine instructions per value |
| 45 | + * - Uses a few hundred bits |
| 46 | + * - Achieves 10% relative accuracy or better |
| 47 | + * <p/> |
| 48 | + * Any feedback to improve the algorithm in its weak points will be welcome. |
| 49 | + * <p/> |
| 50 | + */ |
| 51 | + |
23 | 52 | public class HyperBitBit implements ICardinality {
|
24 | 53 |
|
25 | 54 | int lgN;
|
26 | 55 | long sketch;
|
27 | 56 | long sketch2;
|
28 | 57 |
|
| 58 | + /** |
| 59 | + * Create a new HyperBitBit instance. |
| 60 | + * |
| 61 | + * Remember that it does not work well for small cardinalities! |
| 62 | + */ |
29 | 63 | public HyperBitBit() {
|
30 | 64 | lgN = 5;
|
31 | 65 | sketch = 0;
|
|
0 commit comments