-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ft[torch]: How can we exploit cpu/gpu-parallelization with fabrics. #130
base: develop
Are you sure you want to change the base?
Conversation
Thanks @maxspahn, will have a look at it! :) |
Thanks, I will have a look, too! |
Adds simple comparison between loop and numpy parallelization. |
Okay, a 120x increase is indeed relevant. I will see if i can implement it for my case and potentially try to make it for torch. Thanks! |
Also, the speedup scales with the number of samples, so for 1000 environments, the speed-up is even bigger. But I wasn't patient enough to wait for the result:D |
I have added the translator to torch code from the .c function. I also added some examples using the dingo+kinova and cubic obstacles. |
To have an idea of the difference in performance with the different options (casadi function, parallelized numpy function, parallelized torch function) check these computation times for the dinova example. Take them just a s a reference as the performance could change depending on many things. Casadi (looped) vs NumpyNumpy vs TorchConclusion:For less than ~100 N, looping the casadi function is best. Between 100 and 10k N, numpy is better as it has less overhead than torch. After 10k N, torch becomes better, especially with a very large N, as it seems to remain 300ms as long as you have enough VRAM basically. |
The idea behind this PR is simple:
Very similar approach should be applicable to torch.
@saraybakker1 @AndreuMatoses