Document MaybeUninit bit validity #140463

joshlf · 2025-04-29T14:56:46Z

Partially addresses rust-lang/unsafe-code-guidelines#555 by clarifying that it is sound to write any byte values (initialized or uninitialized) to any MaybeUninit<T> regardless of T.

r? @RalfJung

library/core/src/mem/maybe_uninit.rs

joshlf · 2025-04-29T15:19:13Z

library/core/src/mem/maybe_uninit.rs

 /// }
 /// ```
+///
+/// # Validity


Moving this discussion here:

The MaybeUninit docs probably make sense for this. We now do have a definition of "byte" in the reference that this can link to.

Okay, awesome. And what wording would you recommend? Would it be accurate to say something like the following?

The value of a [MaybeUninit<u8>; N] may contain pointer provenance, and so p: P -> [MaybeUninit<u8>; N] -> P preserves the value of p, including provenance

@RalfJung would you like me to add language like this to this PR?

Update: I've added the following as a more concrete and fleshed out draft. I can edit or remove as preferred.

/// # Provenance /// /// `MaybeUninit` values may contain [pointer provenance][provenance]. Concretely, for any /// pointer type, `P`, which contains provenance, transmuting `p: P` to /// `MaybeUninit<[u8; size_of::<P>]>` and then back to `P` will produce a value identical to /// `p`, including provenance. /// /// [provenance]: ../ptr/index.html#provenance

Updated to reference the definition of a byte.

RalfJung · 2025-05-07T13:11:41Z

Cc @rust-lang/opsem

RalfJung · 2025-05-07T13:14:13Z

library/core/src/mem/maybe_uninit.rs

+/// If `T` contains initialized bytes at byte offsets where `U` contains padding bytes, these
+/// may not be preserved in `MaybeUninit<U>`, and so `transmute(u)` may produce a `T` with
+/// uninitialized bytes in these positions. This is an active area of discussion, and this code
+/// may become sound in the future.


I don't think it makes sense to say that a type "contains initialized bytes" at some offset. That's a property of a representation.

The typical term for representation bytes that are lost here is "padding". I don't think we have rigorously defined padding anywhere yet, but the term is sufficiently widely-used (and generally with a consistent meaning) that we may just be able to use it here?

IIUC, you're making two points:

We should speak about a type's representation containing bytes, not about the type itself containing bytes

In a representation, we should speak about padding bytes rather than uninitialized bytes

Is that right?

One thing that's probably worth distinguishing here is between values and layouts. In my mental model, an uninit byte is one of the possible values that a byte can have (e.g., it's the 257th value that can legally appear in a MaybeUninit<u8>). By contrast, padding is a property of a layout - namely, it's a sequence of bytes in a type's layout that happen to have the validity [MaybeUninit<u8>; PADDING_LEN].

Based on this, maybe it's best to say:

If byte offsets exists at which T's representation does not permit uninitialized bytes but U's representation does (e.g. due to padding), then the bytes in T at these offsets may not be preserved in u, and so transmute(u) may produce a T with uninitialized bytes at these offsets. This is an active area of discussion, and this code may become sound in the future.

Is that right?

No. I think both of the following concepts make sense:

The representation of a particular value at a particular type contains uninitialized bytes.

A type contains padding bytes. (These are bytes which are always ignored by the representation relation.)

But it makes less sense to talk about padding of a representation, or to talk about uninitialized bytes in a type.

So for this PR, the two key points (and they are separate points) are:

If U has padding, those bytes may be reset to "uninitialized" as part of the round-trip. If those same bytes are not padding in T, this can therefore mean some of the information of the original T value is lost.

If T does not permit uninitialized bytes on those positions, the round-trip is UB.

The second point is just a logical consequence of the first, it does not add any new information. Not sure if it is worth mentioning.

The representation of a particular value at a particular type contains uninitialized bytes.

A type contains padding bytes. (These are bytes which are always ignored by the representation relation.)

Does this imply that a type contains padding bytes, not a type's representation?

I'm thinking through the implications of what you said, and I think I understand something new that I didn't before, and I want to run it by you: In my existing mental model, a padding byte is a location in a type's layout such that every byte value at that location (including uninit) is valid (enums complicate this model, but I don't think that complication is relevant for this discussion - we can just stick to thinking about structs). The problem with this mental model is that, interpreted naively, it implies that different byte values in a padding byte could correspond to different logical values of the type. So e.g. in the type #[repr(C)] struct T(u8, u16), [0, 0, 0, 0] and [0, 1, 0, 0] would correspond to different values of the type since we're treating the padding byte itself as part of the representation relation. Of course, that is not something we want.

IIUC, by contrast your model is that the representation relation simply doesn't include padding bytes at all. So it'd be more accurate to describe the representation of T as consisting of three bytes - at offsets 0, 2, and 3. Every representation of T has a "hole" at offset 1 which is not part of the representation. This ensures that there's a 1:1 mapping between logical values and representations. Is that right?

Does this imply that a type contains padding bytes, not a type's representation?

That's how I think about it. We can't tell which byte is a padding byte by looking at one representation -- it's a property of the type.

In my existing mental model, a padding byte is a location in a type's layout such that every byte value at that location (including uninit) is valid

That would make the only byte of MaybeUninit<u8> a padding byte, so I don't think this is the right definition.
That's why I said above: a padding byte is a byte that is ignored by the representation relation. Slightly more formally: if r is some representation valid for type T, and r' is equal to r everywhere except for padding bytes, then r and r' represent the same value.

So it'd be more accurate to describe the representation of T as consisting of three bytes

The representation has 4 bytes. But only 3 of them actually affect the represented value (which is a tuple of two [mathematical] integers).

We seem to be using the term "representation" slightly differently. For me, that's list a List<Byte> of appropriate length. You may be using that term to refer to what I call "representation relation"?

We seem to be using the term "representation" slightly differently. For me, that's list a List<Byte> of appropriate length. You may be using that term to refer to what I call "representation relation"?

That's helpful, thank you!

To avoid rabbit holing too much on the definitions (although it's interesting and useful – just maybe a bit of a distraction here), maybe you could propose language you'd prefer to see in place of what I've written here?

library/core/src/mem/maybe_uninit.rs

RalfJung · 2025-05-23T13:49:33Z

@rustbot ready

library/core/src/mem/maybe_uninit.rs

RalfJung · 2025-05-30T12:20:32Z

@rustbot author

rustbot · 2025-05-30T12:20:36Z

Reminder, once the PR becomes ready for a review, use @rustbot ready.

RalfJung · 2025-06-01T11:58:10Z

I raised the question on Zulip whether it is wise to make a guarantee here that isn't, strictly speaking, documented in the LLVM LangRef. Nikita says he thinks that that's fine -- we may have to adjust how exactly we compile MaybeUninit in the future, but LLVM currently intends do support this case in a somewhat roundabout and incomplete way that seems to work well enough in practice, and LLVM can't more aggressively exploit the fuzziness along the edges of that approximation until a proper alternative exists.

karolzwolak · 2025-08-24T15:31:48Z

Thanks for your contribution @joshlf from wg-triage.
Could you address the comments above?

joshlf · 2025-08-27T13:18:26Z

Thanks for your contribution @joshlf from wg-triage. Could you address the comments above?

I likely won't have time to move this forward until mid-September or October, but I'll follow up at that point.

joshlf · 2025-08-30T22:18:45Z

Thanks for your contribution @joshlf from wg-triage. Could you address the comments above?

I likely won't have time to move this forward until mid-September or October, but I'll follow up at that point.

Nvm, found some time 🙂

I've responded to various comment threads.

karolzwolak · 2025-08-31T12:17:47Z

Awesome, you should also rebase your changes onto master, and do @rustbot ready when you're ready for review.

joshlf · 2025-08-31T14:06:15Z

@rustbot ready

library/core/src/mem/maybe_uninit.rs

RalfJung · 2025-09-08T15:18:54Z

library/core/src/mem/maybe_uninit.rs

+/// be undefined behavior or yield a value different from `t` due to those bytes being lost. This is an active area of discussion, and this code
+/// may become sound in the future.
+///
+/// Note that, so long as no such byte offsets exist, then the preceding `identity` example *is* sound.


Suggested change

/// Note that, so long as no such byte offsets exist, then the preceding `identity` example *is* sound.

/// However, so long as no such byte offsets exist, then the preceding `identity` example *is* sound.

/// In particular, since `[u8; N]` has no padding bytes, transmuting `t` to `MaybeUninit<[u8; size_of::<T>]>`

/// and back will always produce the original value `t` again. This is true even if `t` contains [provenance]:

/// the resulting value will have the same provenance as the original `t`.

///

/// [provenance]: crate::ptr#provenance

And then we don't need the dedicated provenance section at all, do we?

And then we don't need the dedicated provenance section at all, do we?

Done, and removed the provenance section.

RalfJung · 2025-09-08T20:29:54Z

Thanks, I like it! However, more eyes can't hurt -- @rust-lang/opsem @traviscross (since this is basically Reference-like material).

I also wonder what process we need to invoke, if any. This does document a new guarantee, after all. Nominating for t-lang.

RalfJung · 2025-09-08T20:32:37Z

library/core/src/mem/maybe_uninit.rs

+/// However, so long as no such byte offsets exist, then the preceding `identity` example *is* sound.
+/// In particular, since `[u8; N]` has no padding bytes, transmuting `t` to `MaybeUninit<[u8; size_of::<T>]>`
+/// and back will always produce the original value `t` again. This is true even if `t` contains [provenance]:
+/// the resulting value will have the same provenance as the original `t`.


There's a bit of a footgun for potential misunderstandings here but maybe I am being too nitpicky -- and I don't know what else we could say, anyway:

if T is &mut _ or &_, the resulting value will actually not be identical. The transmute itself is a perfect identity, but when a function return a reference (and this includes transmute returning a reference), that can influence the aliasing model. Tree Borrows will consider the resulting reference a child of the original reference.

Maybe the core issue here is how we use the term "value"? E.g. I might say something like: "let y = x produces a y with the same value as x", but it's semantically relevant that there are now two copies of this value instead of one (e.g. image that x: &mut T). So maybe I'm using "value" in a way that doesn't capture everything about the value? More precisely, maybe I'm using "value" in a way that captures everything local about the value, but doesn't capture everything about the relationship between that value and other values (which is relevant when it comes to aliasing). Of course, provenance complicates this story because we talk about provenance "living inside" a particular value, but it's also a non-local property.

That doesn't really resolve your concern, but some random thoughts. Maybe it'll prompt you to think of better language we could use here?

Another possibility: Instead of saying that the value is fully preserved, maybe we could say that the following contents of the value are preserved?

Bit pattern

Provenance

...and then explicitly disclaim any other value contents that we add to the AM in the future?

It's more a problem of abstraction: the Rust code let y = x; becomes something like y = x; Retag(y) in MIR, and that Retag does change the value. Specifically it changes the provenance, making y a child of x rather than wholly identical to x.

(In fact we sometimes even insert reborrows, making this more like y = &mut *x. I don't know the exact conditions for that.)

So maybe I'm using "value" in a way that doesn't capture everything about the value? More precisely, maybe I'm using "value" in a way that captures everything local about the value, but doesn't capture everything about the relationship between that value and other values (which is relevant when it comes to aliasing). Of course, provenance complicates this story because we talk about provenance "living inside" a particular value, but it's also a non-local property.

Alias tracking is based on provenance. Provenance is also just data. It may reference other data, such as indicating a position in a tree.

Another possibility: Instead of saying that the value is fully preserved, maybe we could say that the following contents of the value are preserved?

Bit pattern

Provenance

...and then explicitly disclaim any other value contents that we add to the AM in the future?

In that case, I'm leaning towards this option unless you have thoughts about language we could use that captures specifically the subset of "value" that we want to address here. How does that sound?

Here's the best I can come up with so far. But I feel like this may actually be easier to write for someone who's less deeply entrenched in these discussions. ;)

Note: if t contains a reference, then there may be implicit reborrows of the reference any time it is copied, which may alter its provenance. In that case, the value returned by identity may not be exactly the same as its argument. However, even in this case, it remains true that identity behaves the same as a function that just returns t immediately.

Okay here's my version of this (which is in the PR now):

Note a potential footgun: if t contains a reference, then there may be implicit reborrows of the reference any time it is copied, which may alter its provenance. In that case, while the value returned by identity is exactly the same as its argument, that value may immediately be reborrowed upon return, altering its provenance. This may make this call to identity behave as though it does not exactly preserve provenance.

while the value returned by identity is exactly the same as its argument

That part isn't given (there can be reborrows inside identity).

In fact my comment explicitly stated "the value returned by identity may not be exactly the same as its argument" so not sure how you got from there to your version.

It also seems very useful to state that it is still equivalent to a "boring" identity function.

In fact my comment explicitly stated "the value returned by identity may not be exactly the same as its argument" so not sure how you got from there to your version.

I thought you were being overly-conservative in your wording, which I realize now isn't the case.

Changed to be closer to your wording. Better?

Note a potential footgun: if t contains a reference, then there may be implicit reborrows of the reference any time it is copied, which may alter its provenance. In that case, the value returned by identity may not be exactly the same as its argument. However, even in this case, it remains true that identity behaves the same as a function that just returns t immediately (i.e., fn identity<T>(t: T) -> T { t }).

That works for me :)

traviscross · 2025-09-16T22:23:48Z

library/core/src/mem/maybe_uninit.rs

+/// ```
+///
+/// If the representation of `t` contains initialized bytes at byte offsets where `U` contains padding bytes, these
+/// may not be preserved in `MaybeUninit<U>`. Interpreting the representation of `u` at type `T` again (i.e., `transmute(u)` above) may thus


Suggested change

/// may not be preserved in `MaybeUninit<U>`. Interpreting the representation of `u` at type `T` again (i.e., `transmute(u)` above) may thus

/// may not be preserved in `MaybeUninit<U>`. Interpreting the representation of `u` as type `T` again (i.e., `transmute(u)` above) may thus

IIUC "at" is intentional language that @RalfJung uses – it's nomenclature used by language theorists.

"interpret u as type T" sounds wrong. We don't interpret u as a type, it's not a type. We use the type to tell us how to interpret u.

But I am not sure if there is widely-used standard terminology here, "interpret x at type y" just sounded most natural to me.

Right. Reading more closely, to use "as" here I'd instead use "as of", e.g. "interpret the bytes of x as being of the type T". I.e., a value "has a type", "is of a type", or is treated or interpreted as "having" or "being of" a type.

I'm pretty happy to use PLT jargon in general, but for that to work we need to use the piece of jargon pervasively enough that people pick up on it (and probably discuss the jargon in our notation guide or elsewhere). If we just use it in one place, people will just think it's a typo.

I don't even know if this is PLT jargon or just Ralf jargon. ;)

What if we instead say this?

Transmuting u to T (i.e., transmute(u) above) may thus...

That also works for me. I'd rather say "Transmuting u back to T" (to preserve the "again" in the current version).

traviscross · 2025-09-16T22:55:53Z

We briefly reviewed this is the last lang triage meeting, but it's the sort of thing that's hard to really look through in a meeting, so we deferred it to async review.

@ehuss and I are looking at it now, on the @rust-lang/lang-docs side, as it is Reference-like material, as @RalfJung said. We agreed it'd be good clarify the bit raised in #140463 (comment) regarding provenance and transmute. As discussed in that thread, it'd probably be best to just describe the situation here, in terms of how the surrounding code effects a reborrow, and give an example or demonstration of code that's equivalent in this model.

When that's done, the typo I noted is fixed, and the text is wrapped more consistently, I'll r+ it.

Process-wise, thanks for pinging lang here; that was correct. In fact, let's plan to ping @rust-lang/lang and @rust-lang/lang-docs both on this sort of thing.

Co-authored-by: Ralf Jung <[email protected]> Edited-by: TC

rustbot · 2025-09-21T13:35:09Z

This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

traviscross · 2025-09-21T13:38:40Z

I've revised the text for hopefully better clarity. In particular, as @RalfJung had suggested above, I've stated the guarantees upfront in positive form and then followed with the various caveats. This seemed desirable to me, and will hopefully ease lang review.

@RalfJung, @joshlf, does this look correct? (Did I break anything?)

Let's rewrite this for better clarity. In particular, let's document our language guarantees upfront and in positive form. We'll then list the caveats and the non-guarantees after.

rustbot assigned RalfJung Apr 29, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Apr 29, 2025

joshlf commented Apr 29, 2025

View reviewed changes

library/core/src/mem/maybe_uninit.rs Outdated Show resolved Hide resolved

joshlf mentioned this pull request Apr 29, 2025

Do typed copies of unions preserve "invalid" bytes? rust-lang/unsafe-code-guidelines#555

Closed

RalfJung mentioned this pull request Apr 29, 2025

What about: Pointer-to-integer transmutes? rust-lang/unsafe-code-guidelines#286

Open

joshlf commented Apr 29, 2025

View reviewed changes

This comment has been minimized.

Sign in to view

RalfJung reviewed May 7, 2025

View reviewed changes

library/core/src/mem/maybe_uninit.rs Outdated Show resolved Hide resolved

ia0 reviewed May 23, 2025

View reviewed changes

library/core/src/mem/maybe_uninit.rs Outdated Show resolved Hide resolved

library/core/src/mem/maybe_uninit.rs Outdated Show resolved Hide resolved

RalfJung reviewed May 23, 2025

View reviewed changes

library/core/src/mem/maybe_uninit.rs Outdated Show resolved Hide resolved

library/core/src/mem/maybe_uninit.rs Outdated Show resolved Hide resolved

library/core/src/mem/maybe_uninit.rs Outdated Show resolved Hide resolved

library/core/src/mem/maybe_uninit.rs Outdated Show resolved Hide resolved

RalfJung reviewed May 23, 2025

View reviewed changes

library/core/src/mem/maybe_uninit.rs Outdated Show resolved Hide resolved

RalfJung reviewed May 23, 2025

View reviewed changes

library/core/src/mem/maybe_uninit.rs Outdated Show resolved Hide resolved

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 30, 2025

This comment has been minimized.

Sign in to view

joshlf force-pushed the patch-13 branch from 0605f63 to 470e856 Compare August 31, 2025 14:05

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Aug 31, 2025

rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 31, 2025

RalfJung reviewed Sep 8, 2025

View reviewed changes

RalfJung added the I-lang-nominated Nominated for discussion during a lang team meeting. label Sep 8, 2025

RalfJung reviewed Sep 8, 2025

View reviewed changes

traviscross added I-lang-radar Items that are on lang's radar and will need eventual work or consideration. P-lang-drag-1 Lang team prioritization drag level 1. https://rust-lang.zulipchat.com/#narrow/channel/410516-t-lang T-lang Relevant to the language team labels Sep 9, 2025

traviscross reviewed Sep 16, 2025

View reviewed changes

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 16, 2025

traviscross removed I-lang-nominated Nominated for discussion during a lang team meeting. P-lang-drag-1 Lang team prioritization drag level 1. https://rust-lang.zulipchat.com/#narrow/channel/410516-t-lang labels Sep 16, 2025

Document MaybeUninit bit validity

7971b2a

Co-authored-by: Ralf Jung <[email protected]> Edited-by: TC

traviscross force-pushed the patch-13 branch from 24f8f42 to 77e3f1a Compare September 21, 2025 13:35

Revise MaybeUninit validity documentation

15ce070

Let's rewrite this for better clarity. In particular, let's document our language guarantees upfront and in positive form. We'll then list the caveats and the non-guarantees after.

traviscross force-pushed the patch-13 branch from 77e3f1a to 15ce070 Compare September 21, 2025 13:42

	/// may not be preserved in `MaybeUninit<U>`. Interpreting the representation of `u` at type `T` again (i.e., `transmute(u)` above) may thus
	/// may not be preserved in `MaybeUninit<U>`. Interpreting the representation of `u` as type `T` again (i.e., `transmute(u)` above) may thus

Document MaybeUninit bit validity #140463

Are you sure you want to change the base?

Document MaybeUninit bit validity #140463

Uh oh!

Conversation

joshlf commented Apr 29, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

RalfJung commented May 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RalfJung May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RalfJung May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

RalfJung commented May 23, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RalfJung commented May 30, 2025

Uh oh!

rustbot commented May 30, 2025

Uh oh!

RalfJung commented Jun 1, 2025

Uh oh!

karolzwolak commented Aug 24, 2025

Uh oh!

joshlf commented Aug 27, 2025

Uh oh!

This comment has been minimized.

joshlf commented Aug 30, 2025

Uh oh!

karolzwolak commented Aug 31, 2025

Uh oh!

This comment has been minimized.

joshlf commented Aug 31, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RalfJung commented Sep 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RalfJung Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RalfJung May 8, 2025 •

edited

Loading

RalfJung May 8, 2025 •

edited

Loading

RalfJung Sep 9, 2025 •

edited

Loading

joshlf Sep 9, 2025 •

edited

Loading

RalfJung Sep 18, 2025 •

edited

Loading

RalfJung Sep 19, 2025 •

edited

Loading

RalfJung Sep 18, 2025 •

edited

Loading

traviscross Sep 18, 2025 •

edited

Loading

RalfJung Sep 18, 2025 •

edited

Loading