Skip to content

[SYCL][Docs] Fix sycl_ext_oneapi_peer_access implementation and extension #19787

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: sycl
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,8 @@ This extension is implemented and fully supported by DPC++.
This extension is currently implemented in DPC++ for all GPU devices and
backends; however, only the CUDA, HIP and Level Zero backends allows peer to
peer memory access. Other backends report false from the
`ext_oneapi_can_access_peer` query. Peer-Peer memory access is only possible
between two devices from the same backend.
`ext_oneapi_can_access_peer` query, unless both devices are the same. Peer
memory access is only possible between two devices from the same SYCL platform.

== Overview

Expand Down Expand Up @@ -153,13 +153,17 @@ functions may access USM device allocations on the peer device subject to the
normal rules about context as described in the core SYCL specification.
If this device does not support peer access (as defined by
`peer_access::access_supported`), throws an `exception` with the
`errc::feature_not_supported` error code. If access is already enabled,
throws an exception with the `errc::invalid` error code.
`errc::feature_not_supported` error code.

Calling this function with `peer` for which access has already been enabled will
result in undefined behavior.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will our implementation do in this case?

I looked back at the discussion in #6104, and it seems like the CUDA backend will return an error in this case. (Maybe that ends up throwing an exception?) What is the Level Zero behavior?

Same question for the case below when you disable access that was never enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe CUDA will cause a backend error and L0 will just allow it through without issue. The issue with trying to make the L0 case return an error is that the tracking of which P2P pathways are enabled may become somewhat costly, while the user could do it themselves if they are concerned with overlapping enabling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like @pbalcer to weigh in here, because these things will actually go through UR rather than straight to L0.

My understanding is that the L0 adapter in UR will already have to keep track of which P2P pathways are enabled, in order to avoid sharing all USM allocations with all devices (see #19257).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have to track it anyway, I have no objections to moving back to having an exception for this.



|void ext_oneapi_disable_peer_access(const device &peer)
|Disables access to the peer device's memory from this device. If peer access
is not enabled, throws an `exception` with the `errc::invalid` error code.
|Disables access to the peer device's memory from this device.

Calling this function with `peer` for which access is not enabled will result in
undefined behavior.

|===

12 changes: 10 additions & 2 deletions sycl/source/device.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -220,6 +220,9 @@ void device::ext_oneapi_enable_peer_access(const device &peer) {
ur_device_handle_t Device = impl->getHandleRef();
ur_device_handle_t Peer = peer.impl->getHandleRef();
if (Device != Peer) {
if (!ext_oneapi_can_access_peer(peer))
throw sycl::exception(make_error_code(errc::feature_not_supported),
"Peer access is not allowed between the devices.");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmlueck - If we worry about the additional overhead of doing this check after requiring the user to do it first, we can make it UB instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be feature_not_supported according to the extension spec?

Are we really worried about performance here? It seems like enabling P2P is something the application will just do once in some initialization code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be feature_not_supported according to the extension spec?

Ah, right you are!

Are we really worried about performance here? It seems like enabling P2P is something the application will just do once in some initialization code.

I don't particularly worry about the performance, as you are right that it should mainly be a one-and-done call. It is just redundant when the user is doing it correctly, as they are expected to have already checked the call.

detail::adapter_impl &Adapter = impl->getAdapter();
Adapter.call<detail::UrApiKind::urUsmP2PEnablePeerAccessExp>(Device, Peer);
}
Expand Down Expand Up @@ -255,9 +258,14 @@ bool device::ext_oneapi_can_access_peer(const device &peer,
}();
detail::adapter_impl &Adapter = impl->getAdapter();
int value = 0;
Adapter.call<detail::UrApiKind::urUsmP2PPeerAccessGetInfoExp>(
Device, Peer, UrAttr, sizeof(int), &value, nullptr);
auto Err =
Adapter.call_nocheck<detail::UrApiKind::urUsmP2PPeerAccessGetInfoExp>(
Device, Peer, UrAttr, sizeof(int), &value, nullptr);

// If the backend doesn't support P2P access, neither does its devices.
if (Err == UR_RESULT_ERROR_UNSUPPORTED_FEATURE)
return false;
Adapter.checkUrResult(Err);
return value == 1;
}

Expand Down
1 change: 1 addition & 0 deletions sycl/source/feature_test.hpp.in
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@ inline namespace _V1 {
#define SYCL_KHR_DEFAULT_CONTEXT 1
#define SYCL_EXT_INTEL_EVENT_MODE 1
#define SYCL_EXT_ONEAPI_TANGLE 1
#define SYCL_EXT_ONEAPI_PEER_ACCESS 1

// Unfinished KHR extensions. These extensions are only available if the
// __DPCPP_ENABLE_UNFINISHED_KHR_EXTENSIONS macro is defined.
Expand Down
4 changes: 1 addition & 3 deletions sycl/test-e2e/USM/P2P/p2p_access.cpp
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
// REQUIRES: cuda || hip || level_zero
// RUN: %{build} -o %t.out
// RUN: %{run} %t.out

Expand All @@ -10,8 +9,7 @@
using namespace sycl;

int main() {

auto Devs = platform(gpu_selector_v).get_devices(info::device_type::gpu);
auto Devs = platform().get_devices();

if (Devs.size() < 2) {
std::cout << "Cannot test P2P capabilities, at least two devices are "
Expand Down
4 changes: 1 addition & 3 deletions sycl/test-e2e/USM/P2P/p2p_atomics.cpp
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
// REQUIRES: cuda || hip || level_zero
// RUN: %{build} %if target-nvidia %{ -Xsycl-target-backend=nvptx64-nvidia-cuda --cuda-gpu-arch=sm_61 %} -o %t.out
// RUN: %{run} %t.out

Expand All @@ -18,8 +17,7 @@ using namespace sycl;
constexpr size_t N = 512;

int main() {

auto Devs = platform(gpu_selector_v).get_devices(info::device_type::gpu);
auto Devs = platform().get_devices();

if (Devs.size() < 2) {
std::cout << "Cannot test P2P capabilities, at least two devices are "
Expand Down
4 changes: 1 addition & 3 deletions sycl/test-e2e/USM/P2P/p2p_copy.cpp
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
// REQUIRES: cuda || hip || level_zero
// RUN: %{build} -o %t.out
// RUN: %{run} %t.out

Expand All @@ -15,8 +14,7 @@ using namespace sycl;
constexpr int N = 100;

int main() {

auto Devs = platform(gpu_selector_v).get_devices(info::device_type::gpu);
auto Devs = platform().get_devices();

if (Devs.size() < 2) {
std::cout << "Cannot test P2P capabilities, at least two devices are "
Expand Down
6 changes: 0 additions & 6 deletions unified-runtime/source/adapters/opencl/usm_p2p.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,12 @@
UR_APIEXPORT ur_result_t UR_APICALL
urUsmP2PEnablePeerAccessExp([[maybe_unused]] ur_device_handle_t commandDevice,
[[maybe_unused]] ur_device_handle_t peerDevice) {

die("Experimental P2P feature is not implemented for OpenCL adapter.");
return UR_RESULT_ERROR_UNSUPPORTED_FEATURE;
}

UR_APIEXPORT ur_result_t UR_APICALL
urUsmP2PDisablePeerAccessExp([[maybe_unused]] ur_device_handle_t commandDevice,
[[maybe_unused]] ur_device_handle_t peerDevice) {

die("Experimental P2P feature is not implemented for OpenCL adapter.");
return UR_RESULT_ERROR_UNSUPPORTED_FEATURE;
}

Expand All @@ -32,7 +28,5 @@ UR_APIEXPORT ur_result_t UR_APICALL urUsmP2PPeerAccessGetInfoExp(
[[maybe_unused]] ur_exp_peer_info_t propName,
[[maybe_unused]] size_t propSize, [[maybe_unused]] void *pPropValue,
[[maybe_unused]] size_t *pPropSizeRet) {

die("Experimental P2P feature is not implemented for OpenCL adapter.");
return UR_RESULT_ERROR_UNSUPPORTED_FEATURE;
}
Loading