-
Notifications
You must be signed in to change notification settings - Fork 798
[SYCL][Docs] Fix sycl_ext_oneapi_peer_access implementation and extension #19787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sycl
Are you sure you want to change the base?
Changes from all commits
834a25f
f2d1f8f
3f2e183
f13dc2e
21c380f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -220,6 +220,9 @@ void device::ext_oneapi_enable_peer_access(const device &peer) { | |
ur_device_handle_t Device = impl->getHandleRef(); | ||
ur_device_handle_t Peer = peer.impl->getHandleRef(); | ||
if (Device != Peer) { | ||
if (!ext_oneapi_can_access_peer(peer)) | ||
throw sycl::exception(make_error_code(errc::feature_not_supported), | ||
"Peer access is not allowed between the devices."); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @gmlueck - If we worry about the additional overhead of doing this check after requiring the user to do it first, we can make it UB instead. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't this be Are we really worried about performance here? It seems like enabling P2P is something the application will just do once in some initialization code. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Ah, right you are!
I don't particularly worry about the performance, as you are right that it should mainly be a one-and-done call. It is just redundant when the user is doing it correctly, as they are expected to have already checked the call. |
||
detail::adapter_impl &Adapter = impl->getAdapter(); | ||
Adapter.call<detail::UrApiKind::urUsmP2PEnablePeerAccessExp>(Device, Peer); | ||
} | ||
|
@@ -255,9 +258,14 @@ bool device::ext_oneapi_can_access_peer(const device &peer, | |
}(); | ||
detail::adapter_impl &Adapter = impl->getAdapter(); | ||
int value = 0; | ||
Adapter.call<detail::UrApiKind::urUsmP2PPeerAccessGetInfoExp>( | ||
Device, Peer, UrAttr, sizeof(int), &value, nullptr); | ||
auto Err = | ||
Adapter.call_nocheck<detail::UrApiKind::urUsmP2PPeerAccessGetInfoExp>( | ||
Device, Peer, UrAttr, sizeof(int), &value, nullptr); | ||
|
||
// If the backend doesn't support P2P access, neither does its devices. | ||
if (Err == UR_RESULT_ERROR_UNSUPPORTED_FEATURE) | ||
return false; | ||
Adapter.checkUrResult(Err); | ||
return value == 1; | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will our implementation do in this case?
I looked back at the discussion in #6104, and it seems like the CUDA backend will return an error in this case. (Maybe that ends up throwing an exception?) What is the Level Zero behavior?
Same question for the case below when you disable access that was never enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe CUDA will cause a backend error and L0 will just allow it through without issue. The issue with trying to make the L0 case return an error is that the tracking of which P2P pathways are enabled may become somewhat costly, while the user could do it themselves if they are concerned with overlapping enabling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like @pbalcer to weigh in here, because these things will actually go through UR rather than straight to L0.
My understanding is that the L0 adapter in UR will already have to keep track of which P2P pathways are enabled, in order to avoid sharing all USM allocations with all devices (see #19257).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have to track it anyway, I have no objections to moving back to having an exception for this.