Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: guide: document saved state conventions and howto #846

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mattkur
Copy link
Contributor

@mattkur mattkur commented Feb 12, 2025

Add some documentation on how devs should author and think about saved state.

@mattkur mattkur requested a review from a team as a code owner February 12, 2025 19:41
@github-actions github-actions bot added the Guide label Feb 12, 2025
@yupavlen-ms
Copy link
Contributor

As a follow-up we need a reference to SaveRestore trait, how it applies to vmcore crate and why some drivers may not be able to use it.

@jstarks
Copy link
Member

jstarks commented Feb 12, 2025

Please wrap all markdown at 80 chars for easier offline reading. Consider the "rewrap" extension in vscode.

@jstarks
Copy link
Member

jstarks commented Feb 12, 2025

Can you add a section about how to extend an existing saved state blob? Notes:

  • When you read an old saved state on a new build, any fields that were missing will get their default value.
    • Option<T> => None,
    • Numbers => 0,
    • Strings => "",
    • Structs => each field gets its default value
    • Enums => currently fails: if you need to add an enum, add it as Option<MyEnum>
    • Vecs => empty vec
    • Arrays => fails: if you need to add an array, consider a Vec or Option<[T; N]>.
  • When you read a new saved state on an old build, unknown fields will be ignored.

@jstarks
Copy link
Member

jstarks commented Feb 12, 2025

I've opened #848 to address the problem with adding a new enum field causing a potential compat break.

@mattkur
Copy link
Contributor Author

mattkur commented Feb 12, 2025

As a follow-up we need a reference to SaveRestore trait, how it applies to vmcore crate and why some drivers may not be able to use it.

Thanks for calling this out. I'm happy to add that to this PR, but haven't done the research to answer the questions raised here. Alternatively, please file an issue so we don't forget.

`i32` to `u32` is a breaking change, for example.
2. The Protocol Buffers docs mention what happens for newly added fields, but it
bears adding some nuance here:
1. `arrays` and `enums` are **not supported**. Reading new save state with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are supported, you just need to wrap them in Option.


Since saved state is just Protocol Buffers, use the [guide to updating Protocol Buffers messages](https://protobuf.dev/programming-guides/proto3/#updating) as a starting point, with the following caveats:
Since saved state is just Protocol Buffers, use the [guide to updating Protocol
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to choose a stopping point for this PR. But I would also want to provide some more concrete advice, such as:

When a field has been added and your code reads an old saved state that did not have that field, the field will get the default value, as defined below. You must define the field's type so that the default value makes sense when reading an old saved state.

The easiest way to do this is to add fields with a type Option<T>, so that you can distinguish between old saved states (None) and new ones (Some(_)). But for field types with well-defined default values, this just adds an extra layer of complexity, so consider defining things so that the default value means the same thing as the value being missing.

For example, if you are saving whether the guest enabled some new feature, just use a bool for that, since false is a reasonable value for an old saved state where the feature didn't exist; Option<bool> just adds a third state that the reader has to go look in your save/restore code to understand.

However, for types where there are no default values (currently, enums and arrays), you must wrap the type in an Option to add it to an existing saved state structure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the bool case, that makes a lot of sense. On the flip-side, I'm not sure how I'd feed about seeing something like queue_count: usize (or whatever), where 0 is the in-band representation of missing state.

In general, the Go/Protobuf style of using defaults as in-band signals for missing data just rubs me the wrong way, as now the compiler isn't gonna force you to consider those cases explicitly.

Copy link
Member

@jstarks jstarks Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If queue_count would otherwise never be 0 in a newly generated saved state, then I agree. E.g., if the natural default for queue count were 1, then I wouldn't want us to treat 0 the same as 1--we should use an Option to clarify that the value might be missing (ideally with NonZeroU32 or whatever, which I don't think we support today but we could).

Maybe another way to phrase it is, if you have to have a conditional (explicitly or implicitly via unwrap_or_else or similar) in your restore path to handle the missing case, it should probably be on an Option. If you wouldn't have needed the condition except that you chose to use an Option... well, maybe that was the wrong choice.

Most egregious to me is where the newly added field has a natural default of true, so that None is equivalent to Some(true) rather than Some(false). The guidance should be to flip the parity of the bool in that case, rather than have this unintuitive mapping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to hit pause on new content for this PR. I'll fix up the comment above (#846 (comment)). When I hit pause, I'll file an issue to track any feedback not yet captured.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants