Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process termination needs to be handled properly #216

Open
satra opened this issue Mar 30, 2020 · 1 comment
Open

Process termination needs to be handled properly #216

satra opened this issue Mar 30, 2020 · 1 comment
Labels
bug Something isn't working

Comments

@satra
Copy link
Contributor

satra commented Mar 30, 2020

when a process gets killed by external factors, it will leave the lockfile in place. this will prevent any future process from running.

in a slurm environment that requeues, this will result in a set of jobs that will wait forever.

in a cf environment, this requires clearing out lock files and associated output directories when all parallel runs that do work on the same cache directory are finished.

but the ideal way should be like a termination handler. trapping sigint/sigkill and making sure the job removes the lock file and output folder.

if we want some jobs to restart where they left off, we need a slight modification to the code to indicate that the job can continue.

@satra satra added the bug Something isn't working label Mar 30, 2020
@mgxd
Copy link
Contributor

mgxd commented Jun 16, 2020

this sounds like it could be grouped in with #32

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants