Like any modern, high-performance allocator, snmalloc
contains multiple layers of allocation.
We give here some notes on the internal orchestration.
Consider a first, "small" allocation (typically less than a platform page); such allocations showcase more of the machinery. For simplicity, we assume that
- this is not an
OPEN_ENCLAVE
build, - the
BackendAllocator
has not been told to use afixed_range
, - this is not a
SNMALLOC_CHECK_CLIENT
build, and - (as a consequence of the above)
SNMALLOC_META_PROTECTED
is not#define
-d.
Since this is the first allocation, all the internal caches will be empty, and so we will hit all the slow paths. For simplicity, we gloss over much of the "lazy initialization" that would actually be implied by a first allocation.
-
The
LocalAlloc::small_alloc
finds that it cannot satisfy the request because itsLocalCache
lacks a free list for this size class. The request is delegated, unchanged, toCoreAllocator::small_alloc
. -
The
CoreAllocator
has no active slab for this sizeclass, soCoreAllocator::small_alloc_slow
delegates toBackendAllocator::alloc_chunk
. At this point, the allocation request is enlarged to one or a few chunks (a small counting number multiple ofMIN_CHUNK_SIZE
, which is typically 16KiB); seesizeclass_to_slab_size
. -
BackendAllocator::alloc_chunk
at this point splits the allocation request in two, allocating both the chunk's metadata structure (of sizePAGEMAP_METADATA_STRUCT_SIZE
) and the chunk itself (a multiple ofMIN_CHUNK_SIZE
). Because the two exercise similar bits of machinery, we now track them in parallel in prose despite their sequential nature. -
The
BackendAllocator
has a chain of "range" types that it uses to manage address space. By default (and in the case we are considering), that chain begins with a per-thread "small buddy allocator range".-
For the metadata allocation, the size is (well) below
MIN_CHUNK_SIZE
and so this allocator, which by supposition is empty, attempts torefill
itself from its parent. This results in a request for aMIN_CHUNK_SIZE
chunk from the parent allocator. -
For the chunk allocation, the size is
MIN_CHUNK_SIZE
or larger, so this allocator immediately forwards the request to its parent.
-
-
The next range allocator in the chain is a per-thread large buddy allocator that refills in 2 MiB granules. (2 MiB chosen because it is a typical superpage size.) At this point, both requests are for at least one and no more than a few times
MIN_CHUNK_SIZE
bytes.-
The first request will
refill
this empty allocator by making a request for 2 MiB to its parent. -
The second request will stop here, as the allocator will no longer be empty.
-
-
The chain continues with a
CommitRange
, which simply forwards all allocation requests and (upon unwinding) ensures that the address space is mapped. -
The chain now transitions from thread-local to global; the
GlobalRange
simply serves to acquire a lock around the rest of the chain. -
The next entry in the chain is a
StatsRange
which serves to accumulate statistics. We ignore this stage and continue onwards. -
The next entry in the chain is another large buddy allocator which refills at 16 MiB but can hold regions of any size up to the entire address space. The first request triggers a
refill
, continuing along the chain as a 16 MiB request. (Recall that the second allocation will be handled at an earlier point on the chain.) -
The penultimate entry in the chain is a
PagemapRegisterRange
, which always forwards allocations along the chain. -
At long last, we have arrived at the last entry in the chain, a
PalRange
. This delegates the actual allocation, of 16 MiB, to either thereserve_aligned
orreserve
method of the Platform Abstraction Layer (PAL). -
Having wound the chain onto our stack, we now unwind! The
PagemapRegisterRange
ensures that the Pagemap entries for allocations passing through it are mapped and returns the allocation unaltered. -
The global large buddy allocator splits the 16 MiB refill into 8, 4, and 2 MiB regions it retains as well as returning the remaining 2 MiB back along the chain.
-
The
StatsRange
makes its observations, theGlobalRange
now unlocks the global component of the chain, and theCommitRange
ensures that the allocation is mapped. Aside from these side effects, these propagate the allocation along the chain unaltered. -
We now arrive back at the thread-local large buddy allocator, which takes its 2 MiB refill and breaks it down into powers of two down to the requested
MIN_CHUNK_SIZE
. The second allocation (of the chunk), will either return or again break down one of these intermediate chunks. -
For the first (metadata) allocation, the thread-local small allocator breaks the
MIN_CHUNK_SIZE
allocation down into powers of two down toPAGEMAP_METADATA_STRUCT_SIZE
and returns one of that size. The second allocation will have been forwarded and so is not additionally handled here.
Exciting, no?
The centerpiece of snmalloc
's metadata is its PageMap
, which associates each "chunk" of the address space (~16KiB; see MIN_CHUNK_BITS
) with a MetaEntry
.
A MetaEntry
is a pair of pointers, suggestively named meta
and remote_and_sizeclass
.
In more detail, MetaEntry
s are better represented by Sigma and Pi types, all packed into two pointer-sized words in ways that preserve pointer provenance on CHERI.
To begin decoding, a bit (REMOTE_BACKEND_MARKER
) in remote_and_sizeclass
distinguishes chunks owned by frontend and backend allocators.
For chunks owned by the frontend (REMOTE_BACKEND_MARKER
not asserted),
-
The
remote_and_sizeclass
field is a product of-
A
RemoteAllocator*
indicating theLocalAlloc
that owns the region of memory. -
A "full sizeclass" value (itself a tagged sum type between large and small sizeclasses).
-
-
The
meta
pointer is a bit-stuffed pair of-
A pointer to a larger metadata structure with type dependent on the role of this chunk
-
A bit (
META_BOUNDARY_BIT
) that serves to limit chunk coalescing on platforms where that may not be possible, such as CHERI.
-
See src/backend/metatypes.h
and src/mem/metaslab.h
.
For chunks owned by a backend (REMOTE_BACKEND_MARKER
asserted), there are again multiple possibilities.
For chunks owned by a small buddy allocator, the remainder of the MetaEntry
is zero.
That is, it appears to have small sizeclass 0 and an implausible RemoteAllocator*
.
For chunks owned by a large buddy allocator, the MetaEntry
is instead a node in a red-black tree of all such chunks.
Its contents can be decoded as follows:
-
The
meta
field'sMETA_BOUNDARY_BIT
is preserved, with the same meaning as in the frontend case, above. -
meta
(resp.remote_and_sizeclass
) includes a pointer to the left (resp. right) chunk of address space. (The corresponding child node in this tree is found by taking the address of this chunk and looking up theMetaEntry
in the Pagemap. This trick of pointing at the child's chunk rather than at the childMetaEntry
is particularly useful on CHERI: it allows us to capture the authority to the chunk without needing another pointer and costs just a shift and add.) -
The
meta
field'sLargeBuddyRep::RED_BIT
is used to carry the red/black color of this node.
See src/backend/largebuddyrange.h
.
We can also consider the process for generating a MetaEntry for a chunk of the address space given its state. The following cases apply:
-
The address is not associated with
snmalloc
: Here, theMetaEntry
, if it is mapped, is all zeros and so it...- has
REMOTE_BACKEND_MARKER
clear inremote_and_sizeclass
. - appears to be owned by a frontend RemoteAllocator at address 0 (probably, but not certainly,
nullptr
). - has "small" sizeclass 0, which has size 0.
- has no associated metadata structure.
- has
-
The address is part of a free chunk in a backend's Large Buddy Allocator: The
MetaEntry
...- has
REMOTE_BACKEND_MARKER
asserted inremote_and_sizeclass
. - has "small" sizeclass 0, which has size 0.
- the remainder of its
MetaEntry
structure will be a Large Buddy Allocator rbtree node. - has no associated metadata structure.
- has
-
The address is part of a free chunk inside a backend's Small Buddy Allocator: Here, the
MetaEntry
is zero aside from the assertedREMOTE_BACKEND_MARKER
bit, and so it...- has "small" sizeclass 0, which has size 0.
- has no associated metadata structure.
-
The address is part of a live large allocation (spanning one or more 16KiB chunks): Here, the
MetaEntry
...- has
REMOTE_BACKEND_MARKER
clear inremote_and_sizeclass
. - has a large sizeclass value.
- has an associated
RemoteAllocator*
andMetaslab*
metadata structure (holding just the original chunk pointer in itsMetaCommon
substructure; it is configured to always trigger the deallocation slow-path to skip the logic when a chunk is in use as a slab).
- has
-
The address, whether or not it is presently within an allocated object, is part of an active slab. Here, the
MetaEntry
....- encodes the small sizeclass of all objects in the slab.
- has a
RemoteAllocator*
referencing the owningLocalAlloc
's message queue. - points to the slab's
Metaslab
structure containing additional metadata (e.g., free list).