Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] master from Oneflow-Inc:master #4

Merged
merged 12 commits into from
Feb 25, 2025
Merged

Conversation

pull[bot]
Copy link

@pull pull bot commented Nov 15, 2024

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

@pull pull bot added the ⤵️ pull label Nov 15, 2024
jackalcooper and others added 11 commits December 2, 2024 11:06
Co-authored-by: oneflow-ci-bot <[email protected]>
Co-authored-by: JiangDongHua <[email protected]>
Co-authored-by: oneflow-ci-bot <[email protected]>
- 推进解耦cuda
nccl和oneflow的深度绑定,重构EagerCclCommMgr及ccl::Comm等模块,方便在kernel里直接使用设备无关的(类似primitive)的ccl通信调用实现,替代直接使用nccl
apis,推进后续多设备兼容。
-
后续支持/适配不同设备(cuda/npu/xpu等)时,原则上在kernel以及其他调用通信api的代码处,原则上不应该直接调用类似nccl这样的设备耦合的通信apis,而应该直接使用oneflow::ccl::Send/Recv/AllReduce/....
等父类api(具体位于`oneflow/user/kernels/collective_communication/include`目录下)并提供子类实现
- 后续各设备需继承oneflow::ccl通信apis实现自己的子类通信apis。
- 如cuda设备需要通过nccl
api实现oneflow::ccl::CudaSend/CudaRecv/CudaAllReduce....等。
  - npu设备需要通过hccl api实现oneflow::ccl::NpuSend/NpuRecv/NpuAllReduce等
@pull pull bot merged commit 26a393c into QSLee-Net:master Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants