Skip to content

Commit 28724fd

Browse files
committed
Report failure to start a background worker.
When a worker is flagged as BGW_NEVER_RESTART and we fail to start it, or if it is not marked BGW_NEVER_RESTART but is terminated before startup succeeds, what BgwHandleStatus should be reported? The previous code really hadn't considered this possibility (as indicated by the comments which ignore it completely) and would typically return BGWH_NOT_YET_STARTED, but that's not a good answer, because then there's no way for code using GetBackgroundWorkerPid() to tell the difference between a worker that has not started but will start later and a worker that has not started and will never be started. So, when this case happens, return BGWH_STOPPED instead. Update the comments to reflect this. The preceding fix by itself is insufficient to fix the problem, because the old code also didn't send a notification to the process identified in bgw_notify_pid when startup failed. That might've been technically correct under the theory that the status of the worker was BGWH_NOT_YET_STARTED, because the status would indeed not change when the worker failed to start, but now that we're more usefully reporting BGWH_STOPPED, a notification is needed. Without these fixes, code which starts background workers and then uses the recommended APIs to wait for those background workers to start would hang indefinitely if the postmaster failed to fork a worker. Amit Kapila and Robert Haas Discussion: http://postgr.es/m/CAA4eK1KDfKkvrjxsKJi3WPyceVi3dH1VCkbTJji2fuwKuB=3uw@mail.gmail.com
1 parent 9c64ddd commit 28724fd

File tree

2 files changed

+24
-8
lines changed

2 files changed

+24
-8
lines changed

src/backend/postmaster/bgworker.c

+15-8
Original file line numberDiff line numberDiff line change
@@ -1034,14 +1034,18 @@ RegisterDynamicBackgroundWorker(BackgroundWorker *worker,
10341034
* Get the PID of a dynamically-registered background worker.
10351035
*
10361036
* If the worker is determined to be running, the return value will be
1037-
* BGWH_STARTED and *pidp will get the PID of the worker process.
1038-
* Otherwise, the return value will be BGWH_NOT_YET_STARTED if the worker
1039-
* hasn't been started yet, and BGWH_STOPPED if the worker was previously
1040-
* running but is no longer.
1037+
* BGWH_STARTED and *pidp will get the PID of the worker process. If the
1038+
* postmaster has not yet attempted to start the worker, the return value will
1039+
* be BGWH_NOT_YET_STARTED. Otherwise, the return value is BGWH_STOPPED.
10411040
*
1042-
* In the latter case, the worker may be stopped temporarily (if it is
1043-
* configured for automatic restart and exited non-zero) or gone for
1044-
* good (if it exited with code 0 or if it is configured not to restart).
1041+
* BGWH_STOPPED can indicate either that the worker is temporarily stopped
1042+
* (because it is configured for automatic restart and exited non-zero),
1043+
* or that the worker is permanently stopped (because it exited with exit
1044+
* code 0, or was not configured for automatic restart), or even that the
1045+
* worker was unregistered without ever starting (either because startup
1046+
* failed and the worker is not configured for automatic restart, or because
1047+
* TerminateBackgroundWorker was used before the worker was successfully
1048+
* started).
10451049
*/
10461050
BgwHandleStatus
10471051
GetBackgroundWorkerPid(BackgroundWorkerHandle *handle, pid_t *pidp)
@@ -1066,8 +1070,11 @@ GetBackgroundWorkerPid(BackgroundWorkerHandle *handle, pid_t *pidp)
10661070
* time, but we assume such changes are atomic. So the value we read
10671071
* won't be garbage, but it might be out of date by the time the caller
10681072
* examines it (but that's unavoidable anyway).
1073+
*
1074+
* The in_use flag could be in the process of changing from true to false,
1075+
* but if it is already false then it can't change further.
10691076
*/
1070-
if (handle->generation != slot->generation)
1077+
if (handle->generation != slot->generation || !slot->in_use)
10711078
pid = 0;
10721079
else
10731080
pid = slot->pid;

src/backend/postmaster/postmaster.c

+9
Original file line numberDiff line numberDiff line change
@@ -5909,7 +5909,16 @@ maybe_start_bgworkers(void)
59095909
{
59105910
if (rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART)
59115911
{
5912+
int notify_pid;
5913+
5914+
notify_pid = rw->rw_worker.bgw_notify_pid;
5915+
59125916
ForgetBackgroundWorker(&iter);
5917+
5918+
/* Report worker is gone now. */
5919+
if (notify_pid != 0)
5920+
kill(notify_pid, SIGUSR1);
5921+
59135922
continue;
59145923
}
59155924

0 commit comments

Comments
 (0)