Skip to content

puma plugin doesn't start properly in phased restarts and also crashes them #563

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
pushcx opened this issue May 15, 2025 · 0 comments
Open

Comments

@pushcx
Copy link

pushcx commented May 15, 2025

Two closely-related issues here. I started using plugin :solid_queue in my service to have puma manage the Solid Queue worker process, and it's broken phased restarts, causing an outage on every deployment because I have a single vps serving my site. This seems like a design issue contrary to project goals of enabling straightforward single-server use.

We do a phased restarts whenever possible (deploys that don't touch the bundle) so puma replaces workers one-at-a-time, so we have a zero-downtime deploy.

The first issue is that the plugin crashes on start because it assumes the Rails app is preloaded. I found a workaround (our commit) that might be working, but I suspect it breaks phased restarts in a way that's currently masked by the second issue.

The second issue is that the plugin deliberately crashes puma during phased restarts. In the logs, I see:

May 15 15:31:53 lobste.rs puma[282083]: [282083] - Starting phased worker restart, phase: 1
May 15 15:31:53 lobste.rs puma[282083]: [282083] + Changing to /srv/lobste.rs/http
May 15 15:31:53 lobste.rs puma[282083]: [282083] - Stopping 282297 for phased upgrade...
May 15 15:31:53 lobste.rs puma[282083]: [282083] - TERM sent to 282297...
May 15 15:31:53 lobste.rs puma[282083]: [282083] - Stopping 282297 for phased upgrade...
May 15 15:31:53 lobste.rs puma[282083]: [282083] - Stopping 282297 for phased upgrade...
May 15 15:31:53 lobste.rs puma[282083]: [282083] - Stopping 282297 for phased upgrade...
May 15 15:31:53 lobste.rs puma[282083]: [282083] - Stopping 282297 for phased upgrade...
May 15 15:31:55 lobste.rs puma[282083]: [282083] Detected Solid Queue has gone away, stopping Puma...
May 15 15:31:55 lobste.rs puma[282083]: [282083] - Gracefully shutting down workers...
May 15 15:32:24 lobste.rs puma[282083]: [282083] === puma shutdown: 2025-05-15 15:32:24 +0000 ===

Solid Queue doesn't seem to understand a phased restart and is inappropriately halting the puma supervisor.

Then a few seconds later systemd notices that the puma service has crashed and cold starts it. It's 30+ seconds for one worker to start and the nginx queue has filled up, so we throw a lot of 502s and limp back into normal service while the workers get hammered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant