Skip to content

Commit

Permalink
[FIX] core: SIGXCPU in worker processes
Browse files Browse the repository at this point in the history
odoo#30688 (6ce2d6e) added an
indirection in prefork workers: Python-level signal handlers are
delayed until native calls have ended (e.g. accept() or
execute()). Running the actual work in a sub-thread allowed the main
thread to handle signals in all cases.

However there is apparently an issue with SIGXCPU on linux (possibly
other cases as well): SIGXCPU is delivered to the child thread (if
possible?) and Thread.join apparently stops it from redelivered to
the main thread (Thread.join is signal-interruptible since 3.2 but
possibly not Python-interruptible).

Blocking SIGXCPU on the child thread causes the OS to deliver on the
main thread and fixes the issue.

Also split set_limits so it sets the signal handler in the parent
thread but properly updates the soft limit in the child after each
request, as the goal is to put a hard limit on the CPU time per
request, not on the worker. 6ce2d6e would set the limit once then
never update it, likely cycling workers more than desired.

While at it:

* block other signals with a handler set, they seem to work
  regardless on linux but other OS may have a different way of
  dispatching process-directed signals
* unset signals which are set by the prefork server but whose
  set behavior makes no sense in workers:

  - TERM and CHLD were already unset
  - HUP is used to restart the server, workers can just be killed
  - TTIN and TTOU configure the number of workers

closes odoo#39723

X-original-commit: 549bd19
Signed-off-by: Xavier Morel (xmo) <[email protected]>
  • Loading branch information
xmo-odoo authored and fw-bot committed Nov 4, 2019
1 parent bb49030 commit 659fcfe
Showing 1 changed file with 19 additions and 10 deletions.
29 changes: 19 additions & 10 deletions odoo/service/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -883,6 +883,13 @@ def close(self):
def signal_handler(self, sig, frame):
self.alive = False

def signal_time_expired_handler(self, n, stack):
# TODO: print actual RUSAGE_SELF (since last check_limits) instead of
# just repeating the config setting
_logger.info('Worker (%d) CPU time limit (%s) reached.', self.pid, config['limit_time_cpu'])
# We dont suicide in such case
raise Exception('CPU time limit exceeded.')

def sleep(self):
try:
select.select([self.multi.socket, self.wakeup_fd_r], [], [], self.multi.beat)
Expand All @@ -909,15 +916,9 @@ def check_limits(self):

set_limit_memory_hard()

def set_limits(self):
# SIGXCPU (exceeded CPU time) signal handler will raise an exception.
# update RLIMIT_CPU so limit_time_cpu applies per unit of work
r = resource.getrusage(resource.RUSAGE_SELF)
cpu_time = r.ru_utime + r.ru_stime
def time_expired(n, stack):
_logger.info('Worker (%d) CPU time limit (%s) reached.', self.pid, config['limit_time_cpu'])
# We dont suicide in such case
raise Exception('CPU time limit exceeded.')
signal.signal(signal.SIGXCPU, time_expired)
soft, hard = resource.getrlimit(resource.RLIMIT_CPU)
resource.setrlimit(resource.RLIMIT_CPU, (cpu_time + config['limit_time_cpu'], hard))

Expand All @@ -938,11 +939,15 @@ def start(self):
self.multi.socket.setblocking(0)

signal.signal(signal.SIGINT, self.signal_handler)
signal.signal(signal.SIGXCPU, self.signal_time_expired_handler)

signal.signal(signal.SIGTERM, signal.SIG_DFL)
signal.signal(signal.SIGHUP, signal.SIG_DFL)
signal.signal(signal.SIGCHLD, signal.SIG_DFL)
signal.set_wakeup_fd(self.wakeup_fd_w)
signal.signal(signal.SIGTTIN, signal.SIG_DFL)
signal.signal(signal.SIGTTOU, signal.SIG_DFL)

self.set_limits()
signal.set_wakeup_fd(self.wakeup_fd_w)

def stop(self):
pass
Expand All @@ -964,14 +969,18 @@ def run(self):
sys.exit(1)

def _runloop(self):
signal.pthread_sigmask(signal.SIG_BLOCK, {
signal.SIGXCPU,
signal.SIGINT, signal.SIGQUIT, signal.SIGUSR1,
})
try:
while self.alive:
self.check_limits()
self.multi.pipe_ping(self.watchdog_pipe)
self.sleep()
if not self.alive:
break
self.process_work()
self.check_limits()
except:
_logger.exception("Worker %s (%s) Exception occured, exiting...", self.__class__.__name__, self.pid)
sys.exit(1)
Expand Down

0 comments on commit 659fcfe

Please sign in to comment.