Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: guide: document saved state conventions and howto #846

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions Guide/src/dev_guide/contrib/guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ The OpenVMM Guide is written in Markdown, and rendered to HTML using
this Guide in the main OpenVMM GitHub repo, in the
[`Guide/`](https://github.com/microsoft/openvmm/tree/main/Guide) folder.

## Guide conventions

* Line Length: Wrap lines at 80 characters.

## Editing the Guide

### Small Changes
Expand Down
42 changes: 31 additions & 11 deletions Guide/src/dev_guide/contrib/release.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,24 @@
# Release Management
mattkur marked this conversation as resolved.
Show resolved Hide resolved

Occasionally, the OpenVMM project will declare upcoming release milestones. We stabilize the code base in a `release/YYMM` branch, typically named for the YYMM when the branch was forked. We expect a high quality bar for all code that goes in to the OpenVMM main branch, we ask developers to hold these `release/YYMM` to the highest quality standards. The OpenVMM maintainers will gradually slow the rate of churn into these branches as we get closer to a close date.

This process should not impact your typical workflow; all new work should go into the `main` branch. But, to ease the cherry-picks, we may ask that you hold off from making breaking or large refactoring changes at points in this process.
Occasionally, the OpenVMM project will declare upcoming release milestones. We
stabilize the code base in a `release/YYMM` branch, typically named for the
YYMM when the branch was forked. We expect a high quality bar for all code that
goes in to the OpenVMM main branch, we ask developers to hold these
`release/YYMM` to the highest quality standards. The OpenVMM maintainers will
gradually slow the rate of churn into these branches as we get closer to a
close date.

This process should not impact your typical workflow; all new work should go
into the `main` branch. But, to ease the cherry-picks, we may ask that you hold
off from making breaking or large refactoring changes at points in this
process.

## Marking, Approval Process, Code Flow

The OpenVMM maintainers will publish various dates for the upcoming releases. Currently, these dates are driven by a Microsoft-internal process and can, and do, often change. Microsoft does not mean to convey any new product launches by choices of these dates.
The OpenVMM maintainers will publish various dates for the upcoming releases.
Currently, these dates are driven by a Microsoft-internal process and can, and
do, often change. Microsoft does not mean to convey any new product launches by
choices of these dates.

Releases naturally fall into several phases:

Expand All @@ -20,18 +32,25 @@ Releases naturally fall into several phases:
### release/2411 process
We track the state of candidates for a given release by tagging the PRs:

* `backport_YYMM`: This PR (to `main`) is a candidate to be included in the `YYMM` release.
* N.B.: A maintainer will _remove_ this tag if the fix is not accepted into the release.
* `backported_YYMM`: This PR (to `main`) has been cherry-picked to the `YYMM` release.
* `backport_YYMM_approved`: This PR (to `main`) has been reviewed by the OpenVMM maintainers, who believe this meets the bar. This is a temporary tag until the PR is cherry-picked (e.g. a TODO).
* `backport_YYMM`: This PR (to `main`) is a candidate to be included in the
`YYMM` release.
* N.B.: A maintainer will _remove_ this tag if the fix is not accepted into
the release.
* `backported_YYMM`: This PR (to `main`) has been cherry-picked to the `YYMM`
release.
* `backport_YYMM_approved`: This PR (to `main`) has been reviewed by the OpenVMM
maintainers, who believe this meets the bar. This is a temporary tag until the
PR is cherry-picked (e.g. a TODO).

#### Seeking Approval for Backport

To seek approval to include a change in a release branch, follow these steps:
* Tag your to-`main` PR with `backport_YYMM`.
* Cherry-pick the change to the appropriate `release/YYMM` branch in your fork and stage a PR to that same branch in the main repository.
* Cherry-pick the change to the appropriate `release/YYMM` branch in your fork
and stage a PR to that same branch in the main repository.

Please reach out to the maintainers before staging that PR if you have any doubts.
Please reach out to the maintainers before staging that PR if you have any
doubts.

## Existing Release Branches

Expand All @@ -40,4 +59,5 @@ Please reach out to the maintainers before staging that PR if you have any doubt
| release/2411 | Ask Mode | |

## Depending on Releases
We welcome feedback, especially if you would like to depend on a reliable release process. Please reach out!
We welcome feedback, especially if you would like to depend on a reliable
release process. Please reach out!
57 changes: 46 additions & 11 deletions Guide/src/dev_guide/contrib/save-state.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,63 @@
# Save State

OpenHCL supports the mechanism of saving & restoring state. This primitive can be used for various VM operations, but a key use case is for updating OpenHCL at runtime (a.k.a. "Servicing"). This save state can be stored in memory or on durable media, read back at a later time.
OpenHCL supports the mechanism of saving & restoring state. This primitive can
be used for various VM operations, but a key use case is for updating OpenHCL at
runtime (a.k.a. "Servicing"). This save state can be stored in memory or on
durable media, read back at a later time.

## Save State First Principles

Here are the principles you must maintain when adding new save & restore code:

1. **Save & Restore is Forward & Backwards Compatible**: A newer version of OpenHCL must understand save state from a prior version, and an older version must not crash when reading save state from a newer version.
2. **Do not break save state after it is in use**: Save state must be compatible from *any* commit to any other commit, once the product has shipped and started using that save state.[^1]
3. **All Save State is Protobuf**: All save state is encoded as `ProtoBuf`, using `mesh`.
1. **Save & Restore is Forward & Backwards Compatible**: A newer version of
OpenHCL must understand save state from a prior version, and an older version
must not crash when reading save state from a newer version.
2. **Do not break save state after that state is in use**: Save state must be
compatible from *any* commit to any other commit, once the product has
shipped and started using that save state.[^1]
3. **All Save State is Protocol Buffers**: All save state is encoded as
`ProtoBuf`, using `mesh`.

## Conventions

1. **Put save state in it's own module** This makes PR reviews easier, to catch any mistakes updating save state.
2. **Create a unique package per crate** A logical grouping of saved state should have the same `package`
1. **Put save state in it's own module**: This makes PR reviews easier, to catch
any mistakes updating save state.
2. **Create a unique package per crate**: A logical grouping of saved state
should have the same `package`
3. **Avoid unsupported types, when possible**: These types don't support the
default values needed for safely extending save state:
* Arrays: if you need to add an array, consider a `vec` or `Option<[T; N]>`
instead.
* Enum: if you need to add an enum, add it as `Option<MyEnum>' instead.

## Updating Saved State
## Updating (Extending) Saved State

Since saved state is just Protocol Buffers, use the [guide to updating Protocol Buffers messages](https://protobuf.dev/programming-guides/proto3/#updating) as a starting point, with the following caveats:
Since saved state is just Protocol Buffers, use the [guide to updating Protocol
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to choose a stopping point for this PR. But I would also want to provide some more concrete advice, such as:

When a field has been added and your code reads an old saved state that did not have that field, the field will get the default value, as defined below. You must define the field's type so that the default value makes sense when reading an old saved state.

The easiest way to do this is to add fields with a type Option<T>, so that you can distinguish between old saved states (None) and new ones (Some(_)). But for field types with well-defined default values, this just adds an extra layer of complexity, so consider defining things so that the default value means the same thing as the value being missing.

For example, if you are saving whether the guest enabled some new feature, just use a bool for that, since false is a reasonable value for an old saved state where the feature didn't exist; Option<bool> just adds a third state that the reader has to go look in your save/restore code to understand.

However, for types where there are no default values (currently, enums and arrays), you must wrap the type in an Option to add it to an existing saved state structure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the bool case, that makes a lot of sense. On the flip-side, I'm not sure how I'd feed about seeing something like queue_count: usize (or whatever), where 0 is the in-band representation of missing state.

In general, the Go/Protobuf style of using defaults as in-band signals for missing data just rubs me the wrong way, as now the compiler isn't gonna force you to consider those cases explicitly.

Copy link
Member

@jstarks jstarks Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If queue_count would otherwise never be 0 in a newly generated saved state, then I agree. E.g., if the natural default for queue count were 1, then I wouldn't want us to treat 0 the same as 1--we should use an Option to clarify that the value might be missing (ideally with NonZeroU32 or whatever, which I don't think we support today but we could).

Maybe another way to phrase it is, if you have to have a conditional (explicitly or implicitly via unwrap_or_else or similar) in your restore path to handle the missing case, it should probably be on an Option. If you wouldn't have needed the condition except that you chose to use an Option... well, maybe that was the wrong choice.

Most egregious to me is where the newly added field has a natural default of true, so that None is equivalent to Some(true) rather than Some(false). The guidance should be to flip the parity of the bool in that case, rather than have this unintuitive mapping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to hit pause on new content for this PR. I'll fix up the comment above (#846 (comment)). When I hit pause, I'll file an issue to track any feedback not yet captured.

Buffers messages](https://protobuf.dev/programming-guides/proto3/#updating) as a
starting point, with the following caveats:

1. OpenVMM uses `sint32` to represent the `i32` native type. Therefore, changing `i32` to `u32` is a breaking change, for example.
1. OpenVMM uses `sint32` to represent the `i32` native type. Therefore, changing
`i32` to `u32` is a breaking change, for example.
2. The Protocol Buffers docs mention what happens for newly added fields, but it
bears adding some nuance here:
1. `arrays` and `enums` are **not supported**. Reading new save state with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are supported, you just need to wrap them in Option.

either will fail on an older build.
2. Old -> New Save State: Save state from a prior revision will not contain
some newly added fields. Those fields will get the [default
values](https://protobuf.dev/programming-guides/proto3/#default). This is
how that breaks down for the rust types:
* `Option<T>` => `None`
* Structs => each field gets that field's default value
* Vecs => empty vec
* Numbers => 0
* Strings => `""`
3. New -> Old Save State: Unknown fields are ignored.

## Defining Saved State

Saved state is defined as a `struct` that has `#[derive(Protobuf)]` and `#[mesh(package = "package_name")]` attributes. Here is an example, taken from the `nvme_driver`:
Saved state is defined as a `struct` that has `#[derive(Protobuf)]` and
`#[mesh(package = "package_name")]` attributes. Here is an example, taken from
the `nvme_driver`:

```rust
pub mod save_restore {
Expand All @@ -45,4 +79,5 @@ pub mod save_restore {
}
```

[^1]: Saved state is in use when it reaches a release branch that is in tell mode. See [release management](./release.md) for details.
[^1]: Saved state is in use when it reaches a release branch that is in tell
mode. See [release management](./release.md) for details.