You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello! For the life of me I cannot get slurmcluster dask and cugraph to cooperate. I can get many configurations of slurmcluster dask and cudf to work. Cugraph has various errors for me such as a generic cufile error, or different modules dont exist, or code runs indefinitely... etc. All existing documentatino appears to be with localcudacluster which does not work for my setup and is this even truley multi-node + multi-gpu or just multigpu to use local cuda cluster?
I know my environments are consistent and up to date.
Looking for any better examples or hop on a quick call.
Thank you!!!!!
Code of Conduct
I agree to follow cuGraph's Code of Conduct
I have searched the open issues and have found no duplicates for this question
The text was updated successfully, but these errors were encountered:
@williamcolegithub thank you for reaching out. Can you provide more information on how you setup your cluster please?
All existing documentatino appears to be with localcudacluster
LocalCudaCluster only supports single node multi-gpu hence if you want to run multi nodes, you will need to start each worker with a CLI command like dask-cuda-worker along with the scheduler on one of your nodes with dask-scheduler.
@jnke2016 Heres a reply please see: I see! OK, I will reach out to my slurm team. Yes, I have tried dask-cuda-worker and it resulted in a failure to connect with nanny. So I have been using dask-worker.
From my perspective the documentation was not clear that dask-cuda-worker was essential, it appeared optional. Thank you for clarifying and the fast reply. I will reach out if issues persist.
--- By any chance, is there a distributed notebook you all recommend? I only find examples using local cuda cluster, even notebooks that claim to be multi-node.
What is your question?
Hello! For the life of me I cannot get slurmcluster dask and cugraph to cooperate. I can get many configurations of slurmcluster dask and cudf to work. Cugraph has various errors for me such as a generic cufile error, or different modules dont exist, or code runs indefinitely... etc. All existing documentatino appears to be with localcudacluster which does not work for my setup and is this even truley multi-node + multi-gpu or just multigpu to use local cuda cluster?
I know my environments are consistent and up to date.
Looking for any better examples or hop on a quick call.
Thank you!!!!!
Code of Conduct
The text was updated successfully, but these errors were encountered: