A pure Rust implementation of succinct and compressed data structures.
This crate started is part of the Sux project; it contains also code ported from the DSI Utilities and new structures.
Presently, it provides:
- bit vectors and bit-field vectors;
- several structures for rank and selection with different tradeoffs;
- indexed dictionaries, including an implementation of the Elias–Fano representation of monotone sequences and lists of strings compressed by prefix omission.
The focus is on efficiency (in particular, there are unchecked versions of all methods) and on flexible composability (e.g., you can fine-tune your Elias–Fano instance by choosing different types of internal indices, and whether to index zeros or ones).
All structures in this crate are designed to work well with ε-serde: in
particular, once you have created and serialized them, you can easily map them
into memory or load them in memory regions with specific mmap()
attributes.
All structures in this crate support the MemDbg
and MemSize
traits from
the mem_dbg
crate, which provide convenient facilities for inspecting memory
usage and debugging memory-related issues. For example, this is the output of
mem_dbg()
on a large EliasFano
instance:
117_041_232 B 100.00% ⏺: sux::dict::elias_fano::EliasFano<sux::rank_sel::select_zero_adapt_const::SelectZeroAdaptConst<sux::rank_sel::select_adapt_const::SelectAdaptConst>>
8 B 0.00% ├╴u: usize
8 B 0.00% ├╴n: usize
8 B 0.00% ├╴l: usize
75_000_048 B 64.08% ├╴low_bits: sux::bits::bit_field_vec::BitFieldVec
75_000_024 B 64.08% │ ├╴data: alloc::vec::Vec<usize>
8 B 0.00% │ ├╴bit_width: usize
8 B 0.00% │ ├╴mask: usize
8 B 0.00% │ ╰╴len: usize
42_041_160 B 35.92% ╰╴high_bits: sux::rank_sel::select_zero_adapt_const::SelectZeroAdaptConst<sux::rank_sel::select_adapt_const::SelectAdaptConst>
35_937_608 B 30.71% ├╴bits: sux::rank_sel::select_adapt_const::SelectAdaptConst
32_031_296 B 27.37% │ ├╴bits: sux::bits::bit_vec::CountBitVec
32_031_280 B 27.37% │ │ ├╴data: alloc::vec::Vec<usize>
8 B 0.00% │ │ ├╴len: usize
8 B 0.00% │ │ ╰╴number_of_ones: usize
3_906_312 B 3.34% │ ╰╴inventory: alloc::vec::Vec<u64>
6_103_552 B 5.21% ╰╴inventory: alloc::vec::Vec<u64>
The design of this crate tries to satisfy the following principles:
- High performance: all implementations try to be as fast as possible (we try to minimize cache misses, then tests, and then instructions).
- Composability: all structures are designed to be easily composed with each
other; structures are built on top of other structures, which
can be extracted with the usual
into_inner
idiom. - Zero-cost abstraction: all structures forward conditionally all ranking/selection non-implemented methods on the underlying structures.
- Functoriality: whenever possible, there are mapping methods that replace an underlying structure with another one, provided it is compatible.
What this crate does not provide:
- High genericity: all bit vectors are based on the rather concrete trait combination
AsRef<[usize]>
+BitLength
.
This software has been partially supported by project SERICS (PE00000014) under the NRRP MUR program funded by the EU - NGEU, and by project ANR COREGRAPHIE, grant ANR-20-CE23-0002 of the French Agence Nationale de la Recherche. Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union or the Italian MUR. Neither the European Union nor the Italian MUR can be held responsible for them