puma plugin doesn't start properly in phased restarts and also crashes them #563

pushcx · 2025-05-15T15:47:41Z

Two closely-related issues here. I started using plugin :solid_queue in my service to have puma manage the Solid Queue worker process, and it's broken phased restarts, causing an outage on every deployment because I have a single vps serving my site. This seems like a design issue contrary to project goals of enabling straightforward single-server use.

We do a phased restarts whenever possible (deploys that don't touch the bundle) so puma replaces workers one-at-a-time, so we have a zero-downtime deploy.

The first issue is that the plugin crashes on start because it assumes the Rails app is preloaded. I found a workaround (our commit) that might be working, but I suspect it breaks phased restarts in a way that's currently masked by the second issue.

The second issue is that the plugin deliberately crashes puma during phased restarts. In the logs, I see:

May 15 15:31:53 lobste.rs puma[282083]: [282083] - Starting phased worker restart, phase: 1
May 15 15:31:53 lobste.rs puma[282083]: [282083] + Changing to /srv/lobste.rs/http
May 15 15:31:53 lobste.rs puma[282083]: [282083] - Stopping 282297 for phased upgrade...
May 15 15:31:53 lobste.rs puma[282083]: [282083] - TERM sent to 282297...
May 15 15:31:53 lobste.rs puma[282083]: [282083] - Stopping 282297 for phased upgrade...
May 15 15:31:53 lobste.rs puma[282083]: [282083] - Stopping 282297 for phased upgrade...
May 15 15:31:53 lobste.rs puma[282083]: [282083] - Stopping 282297 for phased upgrade...
May 15 15:31:53 lobste.rs puma[282083]: [282083] - Stopping 282297 for phased upgrade...
May 15 15:31:55 lobste.rs puma[282083]: [282083] Detected Solid Queue has gone away, stopping Puma...
May 15 15:31:55 lobste.rs puma[282083]: [282083] - Gracefully shutting down workers...
May 15 15:32:24 lobste.rs puma[282083]: [282083] === puma shutdown: 2025-05-15 15:32:24 +0000 ===

Solid Queue doesn't seem to understand a phased restart and is inappropriately halting the puma supervisor.

Then a few seconds later systemd notices that the puma service has crashed and cold starts it. It's 30+ seconds for one worker to start and the nginx queue has filled up, so we throw a lot of 502s and limp back into normal service while the workers get hammered.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

puma plugin doesn't start properly in phased restarts and also crashes them #563

puma plugin doesn't start properly in phased restarts and also crashes them #563

pushcx commented May 15, 2025 •

edited

Loading

puma plugin doesn't start properly in phased restarts and also crashes them #563

puma plugin doesn't start properly in phased restarts and also crashes them #563

Comments

pushcx commented May 15, 2025 • edited Loading

pushcx commented May 15, 2025 •

edited

Loading