bitvec
’s core value proposition rests on the fact that it is capable of
defining an unsized slice type, and controlling references to it. The Rust
language rests heavily on the two reference types &
and &mut
, and does not
ordinarily allow these to be faked or created by anything other than the
compiler.
It so happens that not only does rust strongly guarantee the in-memory layout
of a reference to a slice, it also provides a stable API for
constructing values of &[T]
type without using mem::transmute
. Subject to
certain value requirements imposed by types, slice references can be constructed
through these functions and the compiler will accept them as valid.
These requirements traditionally make it difficult to encode non-address information into a bare reference, since the compiler has a very firm expectation that a reference to a type is immediately dereferenceäble to a value of that type, but if your type happens to be zero-sized, then it can never exist in memory, no loads or stores to it can ever be produced, and the compiler no longer concerns itself with the actual bit-pattern value of references to it.
Which is why the definition of BitSlice
is
// src/slice.rs
#[repr(transparent)]
pub struct BitSlice<T, O>
where
T: BitStore,
O: BitOrder,
{
_mem: [()],
_typ: PhantomData<T>,
_ord: PhantomData<O>,
}
BitSlice
is [()]
with some markers that only the type-checker can see.
&BitSlice
is thus &[()]
, and &[()]
can have any values it wants (except,
of course, null).
Slice references contain two pieces of information: the address of the base element, and the number of elements, starting at the base, contained in the slice region. Theoretically, bit-slice references have the same pair of information: the address of the first bit, and the number of bits in the region.
However, computers are byte-addressed, not bit-addressed, so we need to store
three more bits (to select a bit in the base byte) somewhere in the reference.
Since slice references are defined as { base: *T, elts: usize }
, and there are
no1 spare bits in *const _
, the bits to store the base bit are taken out of
the length counter.
Reference address values are also required to be integer multiples of the
alignment of the referent type T
. This alignment is, on all supported targets,
the width in bytes of the referent type. As a result, there are as many low bits
in the address of any T
that are guaranteed to be the 0
value, as there
are bits needed to select a byte within the element. The end result is that the
length counter must always use three bits to store the starting bit, and the
base address will be composed of an aligned T
address and an index of the
starting byte within it.
As Rust does not have bitfield syntax, a definition of the pointer structure in C++ looks like this:
template <typename T>
struct BitSpan<T> {
static_assert(
std::is_unsigned<T>
&& sizeof(T) <= sizeof(size_t)
&& sizeof(T) <= alignof(T)
);
uintptr_t ptr_head : __builtin_ctzll(alignof(T));
uintptr_t ptr_addr : sizeof(uintptr_t) * CHAR_BITS;
- __builtin_tczll(alignof(T));
size_t len_head : 3;
size_t len_bits : sizeof(size_t) * 8 - 3;
}
In Rust, the structure is declared as
// src/pointer.rs
#[repr(C)]
pub struct BitSpan<T, O>
where
T: BitStore,
O: BitOrder,
{
ptr: NonNull<u8>,
len: usize,
_ty: PhantomData<T>,
_or: PhantomData<O>,
}
and the logical components must be accessed through get/set functions, rather than through compiler-generated field stubs.
By marking the pointer as NonNull
, BitSpan
declares that it will never be a
null pointer and becomes subject to the same peephole optimization that allows
mem::size_of::<Option<&T>>() == mem::size_of::<&T>()
. By marking it as
unconditionally a pointer to u8
, we declare that all low bits of the address
value are in use, and none can be used as slots for anything else (since our
encoding is using them to select a byte within the T
).
The null value, { ptr: 0, len: 0 }
, is not valid in BitSpan<T>
, but rather is
used to mark Option::<BitSpan<T>>::None
.
All slices with a bits
logical field are considered equally empty, and equal
to each other. The addr
and head
logical fields can be anything. However,
they are not required to be clamped to 1
and 0
, respectively, because they
can contain important information about the region.
Specifically, the BitVec
type is an owning type that manages a buffer assigned
to it by the memory allocator. It is never allowed to forget the address of its
buffer, even if the user has removed all live bits from the vector.
Rust requires that slice references have a specific ABI, but makes no
requirements about the encoding of values of those references for certain types.
We can supply our own ABI-equivalent structure, define functions that use the
structural encoding to compute the information needed to actually interact with
memory, and convert our structures into Rust-accepted slices through the
provided compiler API in core
.
Footnotes
-
On AMD64, pointers are actually aggregates of MMU translation pages, and processors only decode the low 48 or 57 bits of them, leaving the high 16 or 7 bits available for other information not part of the memory addressing system. However, these processors also trap when attempting to dereference a pointer whose high
[48:]
or[57:]
bits do not have the same bit value as bit[47]
or[56]
, and that bit is typically used to differentiate unprivileged user memory from privileged kernel memory. Furthermore, this dead region does not exist on 32-bit architectures, x86 or otherwise, and sincebitvec
explicitly supports 32-bit systems, the use of dead bits only present on a subset of supported targets and subject to their own extra rules ↩