SNP Guest VSM: Start VP hypercall handling #634

sluck-msft · 2025-01-08T22:52:40Z

Start VP hypercall handling for SNP Guest VSM support.

Tested:

SNP + Guest VSM boots 8 VPs
SNP without Guest VSM boots
non-isolated VM still boots

openhcl/virt_mshv_vtl/src/processor/mod.rs

smalis-msft · 2025-01-24T16:35:57Z

openhcl/virt_mshv_vtl/src/lib.rs

-    /// doesn't handle EnableVpVtl, there's no obvious place to set this.
-    hcvm_vtl1_enabled: Mutex<bool>,
+    /// Only modified for hardware CVMs.
+    hcvm_vtl1_state: Mutex<UhVpCvmVtl1State>,


Could we move this off of VpInner so non-CVMs don't have it anymore? Now that we have partition-wide per-backing state we could put a Vec on UhCvmPartitionState to hold these instead. I was already adding vp_count to BackingSharedParams in #704, so you could copy that bit to size the Vec.

I feel like there's a pattern here, Steven, where you add/suggest adding a bunch of per-X things and I come by and say "hey, just make a single per-X structure and put everything in that." In this case, should we have a Vec<VpInnerCvm> somewhere instead of a bunch of per-VP Vec<Whatever>s?

Sometimes separate Vecs do make sense, for memory locality reasons, but we should be intentional when we do that.

Usually I hold off on making that per-thing struct until there's going to be more than one field in it, but yes it does seem like we're heading in that direction.

smalis-msft · 2025-01-24T16:37:43Z

openhcl/virt_mshv_vtl/src/processor/mod.rs

@@ -254,6 +255,8 @@ mod private {

        fn untrusted_synic(&self) -> Option<&ProcessorSynic>;
        fn untrusted_synic_mut(&mut self) -> Option<&mut ProcessorSynic>;
+
+        fn set_exit_vtl(this: &mut UhProcessor<'_, Self>, vtl: GuestVtl);


Is there ever a case where we're setting the exit vtl but not switching vtl state? I think this should just be handled inside of those two impls.

Ah I see, thats how startvp works. Hmmmm, I really don't like having this here since it's only used for hardware isolated backings, but we need it in handle wake....

@jstarks any thoughts?

I think maybe the way we could achieve it is to "flip" start_virtual_processor so that it's not in the common processor\mod.rs, but has individual implementations per-backing, and then they can call into a common validation code in processor\mod.rs, and for CVMs set the exit_vtl only in the CVM backing. Is that preferable? The main thing I'm not sure about is the ordering of when the exit vtl is set vs the initial state, but I think technically it doesn't matter.

I think we need to 1. restrict the VP start code to be CVM only, since the hypervisor handles it for non-CVM (I think?? I see that we have it in our hypercall table, but I would expect the hypervisor to just handle it, just like it handles INIT) and 2. update the wake code to call out to backing for "extended" wakes, and then have the CVM backings call into some common "handle_cvm_wake" call or something. Then we'll have access to all the CVM stuff in start VP, without needing an extra trait method.

So even with startup suspend, we have no reason to believe that openhcl needs to handle startvp for non-cvms? Can I safely remove it from the hypercall table as well?

Maybe it's worth digging into the closed source history to see why we originally added it for non-cvm.

I suspect we just need it in the APIC emulation case. We also originally had it when the hypervisor didn't have precise VP startup support and we had to emulate INIT/SIPI/startup suspend as well... but that never worked well, as you recall.

I think if you try to remove it from the hypercall table, you also need to make sure we don't report StartVp support to the guest in the case where OpenHCL is providing the mshv privileges/features via cpuid. So there will be a little plumbing through hv1_emulator to make this conditional (we really need to make a full builder pattern or something for that...)

It might be cleanest to make this removal/refactor a separate PR, then layer this one on top. Up to you.

OK so make sure I understand what should happen, it would be:

Remove startvp hypercall handling for non-CVMs, and don't report startvp support to the guest.

Create an "extended wakes" backing, and move the startvp wake handling onto the backings for CVMs.

Is that correct? I'm not sure on item 2 based on this comment

I suspect we just need it in the APIC emulation case.

smalis-msft · 2025-01-24T16:44:54Z

openhcl/virt_mshv_vtl/src/processor/mod.rs

+            // vtls and with the DenyLowerVtlStartup, and in practice, it's not clear
+            // whether any guest OS does this. For now, if guest vsm is enabled,
+            // simplify by disallowing repeated vp startup. Revisit this later if it
+            // becomes a problem. Note that this will not apply to non-hardware cvms


Can just be non-cvm

We decided to scope disallowing repeated vp startup to guestvsm + cvm because the edge cases seem to really come in with guest vsm.

smalis-msft · 2025-01-24T16:46:36Z

openhcl/virt_mshv_vtl/src/processor/hardware_cvm/mod.rs

@@ -1183,7 +1192,7 @@ impl<B: HardwareIsolatedBacking> UhProcessor<'_, B> {
            // Check for VTL preemption - which ignores RFLAGS.IF
            if is_interrupt_pending(self, GuestVtl::Vtl1, false) {
                B::switch_vtl_state(self, GuestVtl::Vtl0, GuestVtl::Vtl1);


I think switch_vtl_state should be responsible for setting exit vtl, so that callers don't have to always call both. Is there ever a case where we're switching state but not exiting?

I thought you had intentionally removed setting the exit vtl from switch_vtl_state in a previous PR, possibly because it makes it more obvious what is happening. I don't have a preference either way, at the moment I don't think we switch the state without changing the exit vtl.

I don't remember doing that but it's certainly possible. Maybe that was when this was only being called in one place instead of multiple? But yeah, I think I'm leaning towards doing it in switch_vtl_state now, so that callers can't mess it up.

jstarks · 2025-01-24T17:04:56Z

openhcl/virt_mshv_vtl/src/processor/mshv/x64.rs

@@ -424,6 +424,10 @@ impl BackingPrivate for HypervisorBackedX86 {
    fn untrusted_synic_mut(&mut self) -> Option<&mut ProcessorSynic> {
        None
    }
+
+    fn set_exit_vtl(_this: &mut UhProcessor<'_, Self>, _vtl: GuestVtl) {


Don't we have a CVM-specific trait we can hang this off of?

see the thread where I pinged you. This is currently needed in handle_wake, which is implemented on all backings.

jstarks · 2025-01-24T17:06:59Z

openhcl/virt_mshv_vtl/src/processor/mod.rs

@@ -1354,8 +1389,60 @@ impl<T, B: Backing> hv1_hypercall::StartVirtualProcessor<hvdef::hypercall::Initi
        let target_vtl = self.target_vtl_no_higher(target_vtl)?;
        let target_vp = &self.vp.partition.vps[target_vp as usize];

-        // TODO CVM GUEST VSM: probably some validation on vtl1_enabled
-        *target_vp.hv_start_enable_vtl_vp[target_vtl].lock() = Some(Box::new(*vp_context));
+        if self.vp.partition.isolation.is_hardware_isolated() {


@smalis-msft , I think we just need to remove support for doing this kind of stuff in the non-CVM case. We're accumulating too much complexity.

I agree, but I think we want to wait until we have CVM CI coverage, right?

In this case I don't know if it's worth it, even in the short run. We should be able to remove this for non-CVM without breaking guest compatibility, as long as we don't report to the guest that we support the hypercall.

jstarks · 2025-01-24T17:10:21Z

openhcl/virt_mshv_vtl/src/processor/mod.rs

+            // whether any guest OS does this. For now, if guest vsm is enabled,
+            // simplify by disallowing repeated vp startup. Revisit this later if it
+            // becomes a problem. Note that this will not apply to non-hardware cvms
+            // as this may regress existing VMs.


Given our goals around guest compatibility within CVMs, this might become a problem. What would it take to support this correctly?

There was speculation over whether any guest actually does this or will want to do this vs an INIT+SIPI. Do you have a concrete example?

I don't have any examples. But the claim here is that 1. we don't need to support it but 2. it's important to support it for non-CVM cases. Given our compatibility goals, these can't both be true.

Does legacy HCL support this? Presumably the hypervisor supports it. Why is it difficult to support here?

My rudimentary understanding is that the desired approach would be to mimic the startup suspend changes that were made in the hypervisor so that start vp only takes effect when the higher VTL does a VTL return. Then handling the shared vtl registers and reasoning about higher vtl state is more straightforward. I agree that claims 1 and 2 can't both be true, but at least for this PR I think I'm willing to take the compromise for CVMs given that we haven't shipped guest vsm support for OpenHCL.

We can't take the legacy HCL as an example because it doesn't properly support it.

OK. I think if you make the other suggested changes (in particular, making this whole code path CVM only), then it's certainly OK for this PR to only support what guests actually do.

sluck-msft assigned smalis-msft Jan 8, 2025

sluck-msft requested a review from a team as a code owner January 8, 2025 22:52

smalis-msft reviewed Jan 9, 2025

View reviewed changes

openhcl/virt_mshv_vtl/src/processor/mod.rs Outdated Show resolved Hide resolved

smalis-msft reviewed Jan 9, 2025

View reviewed changes

openhcl/virt_mshv_vtl/src/processor/mod.rs Outdated Show resolved Hide resolved

smalis-msft reviewed Jan 9, 2025

View reviewed changes

openhcl/virt_mshv_vtl/src/processor/mod.rs Outdated Show resolved Hide resolved

smalis-msft reviewed Jan 9, 2025

View reviewed changes

openhcl/virt_mshv_vtl/src/processor/mod.rs Outdated Show resolved Hide resolved

sluck-msft added 3 commits January 22, 2025 14:30

hack implementation

6aa0533

cleanup

d7e0dbe

disallow startvp on a running vp

37d1d1c

sluck-msft force-pushed the gvsm/start-vp branch from 899e334 to 37d1d1c Compare January 23, 2025 01:57

smalis-msft reviewed Jan 24, 2025

View reviewed changes

jstarks reviewed Jan 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SNP Guest VSM: Start VP hypercall handling #634

SNP Guest VSM: Start VP hypercall handling #634

sluck-msft commented Jan 8, 2025

smalis-msft Jan 24, 2025

jstarks Jan 24, 2025

smalis-msft Jan 24, 2025

smalis-msft Jan 24, 2025

smalis-msft Jan 24, 2025

smalis-msft Jan 24, 2025

sluck-msft Jan 24, 2025 •

edited

Loading

jstarks Jan 24, 2025

sluck-msft Jan 24, 2025 •

edited

Loading

smalis-msft Jan 24, 2025

jstarks Jan 24, 2025

sluck-msft Jan 24, 2025

sluck-msft Jan 25, 2025

smalis-msft Jan 24, 2025

sluck-msft Jan 24, 2025 •

edited

Loading

smalis-msft Jan 24, 2025

sluck-msft Jan 24, 2025

smalis-msft Jan 24, 2025 •

edited

Loading

jstarks Jan 24, 2025

smalis-msft Jan 24, 2025

jstarks Jan 24, 2025 •

edited

Loading

smalis-msft Jan 24, 2025

jstarks Jan 24, 2025

jstarks Jan 24, 2025

sluck-msft Jan 24, 2025

jstarks Jan 24, 2025

sluck-msft Jan 24, 2025

jstarks Jan 24, 2025 •

edited

Loading

SNP Guest VSM: Start VP hypercall handling #634

Are you sure you want to change the base?

SNP Guest VSM: Start VP hypercall handling #634

Conversation

sluck-msft commented Jan 8, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sluck-msft Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sluck-msft Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sluck-msft Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smalis-msft Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jstarks Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jstarks Jan 24, 2025 • edited Loading

Choose a reason for hiding this comment

sluck-msft Jan 24, 2025 •

edited

Loading

sluck-msft Jan 24, 2025 •

edited

Loading

sluck-msft Jan 24, 2025 •

edited

Loading

smalis-msft Jan 24, 2025 •

edited

Loading

jstarks Jan 24, 2025 •

edited

Loading

jstarks Jan 24, 2025 •

edited

Loading