-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
P2P send recv test gives errors #8074
Comments
on TPU side we never really tested the send and recv. @jeffhataws I wonder if you guys are actively using |
Just tested with |
@JackCaoG I am curious if you never use |
We(Google team) have been mostly focus on TPU and since the inter-host connection(ICI in TPU case) is fast, we were able to get away with the FSDP + TP. We have not experience PP too much. |
This modified code
works without errors using
However if I uncomment
I see following errors:
|
🐛 Bug
Trying to test simple
xm.send
andxm.recv
gives error.To Reproduce
Steps to reproduce the behavior:
Expected behavior
Test code should run without errors
Log output showing error
Environment
Docker image
Nvidia GPUs
OS
The text was updated successfully, but these errors were encountered: