Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel execution issues on macOS #312

Open
slaperche-scality opened this issue Jul 4, 2019 · 7 comments
Open

Parallel execution issues on macOS #312

slaperche-scality opened this issue Jul 4, 2019 · 7 comments

Comments

@slaperche-scality
Copy link
Contributor

slaperche-scality commented Jul 4, 2019

(sorry for the wall of text, I tried to give as much info as possible, feel free to ask me clarification if something is unclear)

Context

It seems like the parallel execution feature of doit doesn't work on macOS (I noticed it on a coworker's laptop, being on Linux I never noticed any issue on my side).

I investigated on my side and came up with a satisfying workaround/fix (force the use of threads on macOS).

Long story short, this is related to the fact that by default:

  • doit uses multiprocessing for parallel execution
  • multiprocessing uses fork on Unix-like system
  • recent version of macOS introduced a safety check against mixing forks and threads.

For more information you can take a look at the issue I tackled on our side (scality/metalk8s#1354).

Here is a minimal dodo.py that reproduce the problem if you exectute on macOS (10.13 or higher):

#!/usr/bin/env python3
# coding: utf-8

import random
import time
import urllib.request

import doit


def task_foo():
    def action():
        urllib.request.urlopen('http://example.com/')
        time.sleep(random.randint(1, 5))

    for i in range(10):
        yield {
            'name': str(i),
            'actions': [action]
        }

To be executed with doit -n 4.

Note that the crash happen only when you're calling into Cocoa libraries (that why I added an urllib call, simply sleeping doesn't exhibit the behavior).

What can be done

I know it's not really an doit issue per se, and after reading this very informative thread on the Python bugtracker one can see that the issue will probably disappear with Python 3.8 (where multiprocessing will be using spawn instead of fork by default).

But I believe that we may want to do something on doit side because Python 3.8 is not out yet (and people will keep using older Python for a while), so it may be interesting to have a fix/workaround in doit itself.
I think that the issue can be solved by calling either multiprocessing.set_start_method('forkserver') or multiprocessing.set_start_method('spawn') (I would go for spawn since official Python is moving that way) before creating any subprocesses in doit.

If it doesn't work/it's too much work/outside the scope of doit, it's possible to instead mention the issue in the documentation about parallel execution (so that people are not surprised/know how to deal with).
Or maybe simply disabling the use of multiprocessing on macOS (and using only threads).

In any case, if you want to keep the support for multiprocessing on macOS, you may want to take a look now because the current code will not work in 3.8 (which will be using spawn by default): adding a call to multiprocessing.set_start_method('spawn') in the provided dodo.py triggers a pickle error (which is not totally unexpected as spawn has more constraints thant fork, see here).

Environment

  1. OS: macOS 10.14.3
  2. Python version: 3.7.3
  3. doit version: 0.31.1
Fund with Polar
@schettino72
Copy link
Member

Thanks for clear explanation.

I just tried doit with following patch:

diff --git a/doit/__main__.py b/doit/__main__.py
index b6a1481..b7818b7 100644
--- a/doit/__main__.py
+++ b/doit/__main__.py
@@ -1,6 +1,9 @@
 # lazy way to ignore coverage in this file
 if True: # pragma: no cover
     def main():
+        import multiprocessing as mp
+        mp.set_start_method('spawn')
+
         import sys
 
         from doit.doit_cmd import DoitMain

I test this using doit own dodo.py task pyflakes.

Works for me on linux.
forkserver also worked but was considerably slower.

I suggest a command line argument to control this...
Or maybe an ENV variable. Docs mention that this must be on __main__ section. Again my patch works on linux but not sure about other platforms.
And make fork mode not available on MAC.

Regarding changing the default. I am not clear in which situation you would have pickle problems with spawn but not with fork. Could you provide a minimum example?

@slaperche-scality
Copy link
Contributor Author

Yes forkserver is notoriously slower (that's probably why Python is moving to spawn instead of forkserver for macOS).

Using the dodo.py file given in the first message and applying your patch to the doit/__main__.py in my virtualenv, when I run doit -n 4 I get the following traceback (even on Linux):

Traceback (most recent call last):
  File "/home/sylvain/dev/poc_doit/.venv/lib/python3.7/site-packages/doit/doit_cmd.py", line 177, in run
    return command.parse_execute(args)
  File "/home/sylvain/dev/poc_doit/.venv/lib/python3.7/site-packages/doit/cmd_base.py", line 127, in parse_execute
    return self.execute(params, args)
  File "/home/sylvain/dev/poc_doit/.venv/lib/python3.7/site-packages/doit/cmd_base.py", line 420, in execute
    return self._execute(**exec_params)
  File "/home/sylvain/dev/poc_doit/.venv/lib/python3.7/site-packages/doit/cmd_run.py", line 260, in _execute
    return runner.run_all(self.control.task_dispatcher())
  File "/home/sylvain/dev/poc_doit/.venv/lib/python3.7/site-packages/doit/runner.py", line 256, in run_all
    self.run_tasks(task_dispatcher)
  File "/home/sylvain/dev/poc_doit/.venv/lib/python3.7/site-packages/doit/runner.py", line 459, in run_tasks
    proc_list = self._run_start_processes(job_q, result_q)
  File "/home/sylvain/dev/poc_doit/.venv/lib/python3.7/site-packages/doit/runner.py", line 432, in _run_start_processes
    process.start()
  File "/usr/lib/python3.7/multiprocessing/process.py", line 112, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.7/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/lib/python3.7/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/usr/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/usr/lib/python3.7/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'task_foo.<locals>.action'

According to the documentation spawn has a few extra restriction which don’t apply to the fork start method (e.g.: see the "More picklability" section).

@schettino72
Copy link
Member

Oh, sorry. You had already given an example.

So the problem is usage of a closure as an action. Windows also has this limitation...

Yes, I saw the docs regarding extra restrictions. Although it doesnt mention closures and i am not sure which restrictions from docs is the offending one.
I would not know how to make closures work.

I will stick to my proposed solution of adding a command line argument to control this. And by default just use python's default. Maybe add a note on docs for MAC and Windows users to not use closures on actions.

@slaperche-scality
Copy link
Contributor Author

Ha yes, indeed: moving action outside (and thus making it a normal function instead of a closure) solve the issue.

I'll do the test on the macOS of my coworker on Monday to confirm that it works there as well, but that looks good to me 🙂

@slaperche-scality
Copy link
Contributor Author

slaperche-scality commented Jul 8, 2019

I've just tried your patch on macOS and it works as expected.

@schettino72
Copy link
Member

good, but i dont want to force use spawn on linux...

So a fix/patch would need to:

  • multiprocessing start method set to spawn by default on mac
  • add a command line argument to control start method
  • document that closures on actions are not supported on windows/mac

@slaperche-scality
Copy link
Contributor Author

Looks good to me 🙂

Maybe add a check if someone set fork on mac (using the command line argument) and raise an error or print a warning in that case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants