Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: state driver request queue dispositions are difficult to manage #879

Closed
gjcolombo opened this issue Mar 5, 2025 · 0 comments · Fixed by #880
Closed

server: state driver request queue dispositions are difficult to manage #879

gjcolombo opened this issue Mar 5, 2025 · 0 comments · Fixed by #880
Labels
server Related specifically to the Propolis server API and its VM management functions.

Comments

@gjcolombo
Copy link
Contributor

This arose from the discussion of #873.

The problem

The server state driver's external request queue (source) receives requests to operate on a VM and decides whether to Enqueue, Ignore, or Deny them. The queue stores the current disposition for each kind of request it can receive. When the queue receives a new request, or when the state driver notifies it of a change in VM state, the queue runs its get_new_dispositions function to compute the new set of dispositions.

In general, the queue tries to deny requests that either can't be serviced right now or that might be preempted by some other, as-yet-unprocessed event. For example, the queue denies requests to reboot an instance that's not running, but it also denies reboot requests that are preceded by an unprocessed request to migrate or stop the instance: in the former case, the request may need to be redirected to the migration target, and in the latter case, the request won't be actionable by the time it gets to the front of the queue.

The problem is that get_new_dispositions computes the new dispositions as a function of the incoming event and the queue's previous dispositions, and the old dispositions don't always provide enough context to compute new ones accurately. Consider the case above: if a VM is rejecting reboot requests, but then it starts, should reboot requests now be accepted? The answer is "no" if a request to stop/migrate is pending and "yes" otherwise, but there's no way to distinguish these cases just from the prior disposition, because reboot requests on an un-started instance are rejected regardless.

As the state driver gets more complex, and begins to interleave handling of various kinds of requests (see #873), this is going to get more and more difficult to manage.

Proposal

I think the queue can be improved in a couple of ways.

First, request dispositions should be a function of the queue's current state and its memory of the requests that have been queued but not acknowledged by the state driver. This prevents problems like the one described above: if a currently-starting VM queues a request to stop, the queue should remember that request and deny subsequent requests to reboot, even if the state driver successfully starts the VM in the meantime.

Second, it would be very handy for the queue to distinguish requests to change a VM's state from requests to change its configuration. This is useful primarily for #873, where we want the state driver to be able to process VCR change requests while a VM is starting without reordering or losing track of state change requests that should be processed only after the VM has successfully started.

I have a change ready to go that implements both of these behaviors, as well as a merge commit that extends #873 to take advantage of them. I hope to have a PR up soon.

@gjcolombo gjcolombo added the server Related specifically to the Propolis server API and its VM management functions. label Mar 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
server Related specifically to the Propolis server API and its VM management functions.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant