Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run bpipe in parallel #58

Open
lonsbio opened this issue Oct 20, 2014 · 3 comments
Open

run bpipe in parallel #58

lonsbio opened this issue Oct 20, 2014 · 3 comments

Comments

@lonsbio
Copy link
Collaborator

lonsbio commented Oct 20, 2014

From [email protected] on 2012-08-20T12:09:09Z

I tried to run bpipe in the cluster with qsub support. And I tested on a simple pipe but got the error message:

...
Pipeline failed!

Job runner bpipe.TorqueCommandExecutor failed to return a job id despite reporting success exit code for command:

bash /mnt/Home/zhuw/prj/dn/bpipe/bpipe-0.9.5.3/bin/../bin/bpipe-torque.sh start

Raw output was:[
]
...

Could any of you provide a good example with the config setting to make a test.

Thanks,

Wei

Original issue: http://code.google.com/p/bpipe/issues/detail?id=58

@lonsbio
Copy link
Collaborator Author

lonsbio commented Oct 20, 2014

From [email protected] on 2013-09-28T22:27:34Z

I have seen the same error; this appears to be due to status UNKNOWN being returned from the polling.

.bpipe/logs/2832.bpipe.log (excerpt):
bpipe.executor.CustomCommandExecutor INFO |11:36:09 Poll returned new status for command 472700: UNKNOWN

.bpipe/logs/2832.log (excerpt):

| Starting Pipeline at 2013-09-28 23:36 |

=========================================== Stage hello ============================================
Cleaned up file test.hello.txt to .bpipe/trash/test.hello.txt
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.codehaus.groovy.tools.GroovyStarter.rootLoader(GroovyStarter.java:106)
at org.codehaus.groovy.tools.GroovyStarter.main(GroovyStarter.java:128)
Caused by: groovy.lang.MissingMethodException: No signature of method: java.util.logging.Logger.warn() is applicable for argument types: (org.codehaus.groovy.runtime.GStringImpl) values: [Job status query returned UNKNOWN for job 472700. Will retry.]
Possible solutions: wait(), wait(long), any(), wait(long, int), warning(java.lang.String), any(groovy.lang.Closure)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.unwrap(ScriptBytecodeAdapter.java:55)
at org.codehaus.groovy.runtime.callsite.PojoMetaClassSite.call(PojoMetaClassSite.java:46)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
at bpipe.executor.CustomCommandExecutor.waitFor(CustomCommandExecutor.groovy:315)
....

@lonsbio
Copy link
Collaborator Author

lonsbio commented Oct 20, 2014

From [email protected] on 2013-09-29T04:15:08Z

Thanks for posting the stack trace. I've fixed the problem that occurred there. I'm not sure if it will address this issue or not, as the error is in the handling of the UNKNOWN status, so the fix will not be preventing UNKNOWN status from occurring. However the intent of the code is to retry in the face of UNKNOWN being returned, so it may behave better now if it retries. I will be releasing a new version soon with this fix in it.

@lonsbio
Copy link
Collaborator Author

lonsbio commented Oct 20, 2014

From [email protected] on 2013-09-29T10:12:46Z

In our case the jobs are removed from the cluster after completion, so qstat won't catch them (which in the bpipe_torque.sh script triggers the UNKNOWN status). I can check on alternatives with our cluster admins.

We have a similar issue with submission of jobs, in that if we run 'bpipe run' as a job (e.g. from a worker node) our worker nodes are not configured to allow job submission. We have a script to get around this but it requires modifying the bpipe_torque.sh script. I'm wondering whether there is a way to make the two (job submission and job polling) more generic or flexible?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant