You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
when a process gets killed by external factors, it will leave the lockfile in place. this will prevent any future process from running.
in a slurm environment that requeues, this will result in a set of jobs that will wait forever.
in a cf environment, this requires clearing out lock files and associated output directories when all parallel runs that do work on the same cache directory are finished.
but the ideal way should be like a termination handler. trapping sigint/sigkill and making sure the job removes the lock file and output folder.
if we want some jobs to restart where they left off, we need a slight modification to the code to indicate that the job can continue.
The text was updated successfully, but these errors were encountered:
when a process gets killed by external factors, it will leave the lockfile in place. this will prevent any future process from running.
in a slurm environment that requeues, this will result in a set of jobs that will wait forever.
in a cf environment, this requires clearing out lock files and associated output directories when all parallel runs that do work on the same cache directory are finished.
but the ideal way should be like a termination handler. trapping sigint/sigkill and making sure the job removes the lock file and output folder.
if we want some jobs to restart where they left off, we need a slight modification to the code to indicate that the job can continue.
The text was updated successfully, but these errors were encountered: