-
Notifications
You must be signed in to change notification settings - Fork 798
[SYCL][NVPTX][AMDGCN] Move devicelib cmath to header #18706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
42 commits
Select commit
Hold shift + click to select a range
b139fa4
[SYCL][NVPTX][AMDGCN] Move devicelib cmath to header
npmiller e942076
[SYCL] Fixup attribute handling
npmiller d192f33
[SYCL] Use hasAttr instead of hasExplicitAttr
npmiller a82cebc
[SYCL] Update fallback header
npmiller b3574fb
[SYCL] Remove sycl-libdevice-cmath.cpp test
npmiller 3d2aa44
[SYCL] Add missing abs
npmiller 8ce1d93
[SYCL] Fix overloadble requirement for sycl_device_only
npmiller d3b2988
[SYCL] Add device only docs
npmiller 350aec6
[SYCL] Add initial test for sycl-device-only
npmiller 2c119ae
[SYCL] Add diagnostic for host side sycl_device_only
npmiller 605eed6
[SYCL] Block sycl_device_only emission on host side
npmiller 172e7f7
Revert "[SYCL] Add diagnostic for host side sycl_device_only"
npmiller bb9fc66
[SYCL] Cleanup documentation and comments
npmiller 06474cb
[SYCL] Fix attribute emission handling
npmiller 59c6edf
[SYCL] Fix formatting
npmiller 1a6e0f5
[SYCL] Rename variable
npmiller 5df06cf
[SYCL] More fallback header improvements
npmiller e0cd399
[SYCL] Add nearbyint and rint to devicelib tests
npmiller 8affa7a
[SYCL] Fix formatting
npmiller 8d9733d
[SYCL] Add SYCL_EXTERNAL to neabyint and rint
npmiller ec84f57
[SYCL][E2E] Remove nearbyint from test and stl wrapper
npmiller 9db7bde
[SYCL] Don't leak macros from cmath-fallback.h
npmiller d8a273b
[SYCL] Fix if/else/return formatting
npmiller f53840b
[SYCL] Cleanup use of isSYCL() in SemaOverload
npmiller 8477da7
[SYCL] Add missing copyright notices
npmiller 34d26f4
[SYCL] Re-use DDI in CodeGenModule
npmiller 4a576c4
[SYCL] Cleanup and complete fallback header
npmiller 728d557
[SYCL] Add cmath-fallback header test
npmiller 6ac4ba1
[SYCL] Fix formatting
npmiller d9713d5
[SYCL] Fixup promotion overloads
npmiller 56d0e33
[SYCL] Skip sycl.hpp in lit test
npmiller 459ed03
[SYCL] Fixup SFINAE
npmiller a84f1e3
[SYCL] Minor cleanups
npmiller b8bfac4
Merge branch 'sycl' into rip-libdevice
npmiller 2bdd6a5
[SYCL] Rename cmath header
npmiller ea6a3c5
[SYCL] Switch attribute from GNU to Clang
npmiller 0a1508f
Merge branch 'sycl' into rip-libdevice
npmiller fba7042
[SYCL] Update copyright header
npmiller 5d13b1d
Mark __sycl_cmath_wrapper_impl.hpp as expected to fail self-contained…
bader ef1e406
Make __sycl_cmath_wrapper_impl.hpp more upstream-able.
bader be69cdd
Update the header guard macro name.
bader 6d52c97
[SYCL] Make sycl_device and sycl_device_only incompatible
npmiller File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
// RUN: %clang_cc1 -fsycl-is-device -triple spir64-unknown-unknown -disable-llvm-passes -emit-llvm %s -o - | FileCheck %s --check-prefix=CHECKD | ||
// RUN: %clang_cc1 -fsycl-is-host -triple spir64-unknown-unknown -disable-llvm-passes -emit-llvm %s -o - | FileCheck %s --check-prefix=CHECKH | ||
// Test code generation for sycl_device_only attribute. | ||
|
||
// Verify that the device overload is used on device. | ||
// | ||
// CHECK-LABEL: _Z3fooi | ||
// CHECKH: %add = add nsw i32 %0, 10 | ||
// CHECKD: %add = add nsw i32 %0, 20 | ||
int foo(int a) { return a + 10; } | ||
__attribute__((sycl_device_only)) int foo(int a) { return a + 20; } | ||
|
||
// Use a `sycl_device` function as entry point | ||
__attribute__((sycl_device)) int bar(int b) { return foo(b); } | ||
|
||
// Verify that the order of declaration doesn't change the behavior. | ||
// | ||
// CHECK-LABEL: _Z3fooswapi | ||
// CHECKH: %add = add nsw i32 %0, 10 | ||
// CHECKD: %add = add nsw i32 %0, 20 | ||
__attribute__((sycl_device_only)) int fooswap(int a) { return a + 20; } | ||
int fooswap(int a) { return a + 10; } | ||
|
||
// Use a `sycl_device` function as entry point. | ||
__attribute__((sycl_device)) int barswap(int b) { return fooswap(b); } | ||
|
||
// Verify that in extern C the attribute enables mangling. | ||
extern "C" { | ||
// CHECK-LABEL: _Z3fooci | ||
// CHECKH: %add = add nsw i32 %0, 10 | ||
// CHECKD: %add = add nsw i32 %0, 20 | ||
int fooc(int a) { return a + 10; } | ||
__attribute__((sycl_device_only)) int fooc(int a) { return a + 20; } | ||
|
||
// Use a `sycl_device` function as entry point. | ||
__attribute__((sycl_device)) int barc(int b) { return fooc(b); } | ||
} |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a potentially expensive thing to do for every function to be emitted.
The changes to overload resolution should already ensure that the right declaration is placed on the deferred decls list due to ODR-use. I assume that the only other way that a function gets on this list is if it is a non-inline function defined in the translation unit. But those functions shouldn't be emitted for a SYCL device unless ODR-used anyway. So what are the scenarios that require checking for the "wrong" decl being added to the deferred decls list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a very good point. I ran into this while trying to be thorough in the
sycl-device-only.cpp
lit test.But in that lit test we're compiling with
-cc1 -fsycl-is-device
, which I believe will skip the regular SYCL host/device split stage that would get rid of the extra overload.So I think you're correct that this case should never happen in regular SYCL compilation, but it's very easy to trigger with
-fsycl-is-device
, but it's probably fair to say that this type of device only code is invalid. I'll look into removing this condition block and adding a diagnostic instead.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps one possibility is a function that is declared
SYCL_EXTERNAL
and defined in the translation unit? I assume those functions are emitted for the device target regardless of ODR-use. Something like the following would then be a problem (and should thus be diagnosed as an error to prevent a symbol conflict).Related question: Is it ok to declare a function
SYCL_EXTERNAL
and later define it assycl_device_only
? This is probably worth a test.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That does trigger the issue as well, but would get caught in the diagnsotic I'm looking at.
I believe this should be fine, however without this condition block, I just found out that this:
ends up selecting the "host"
g
in device code which is quite strange.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some more investigation, it seems to me that this might still be the least disruptive solution.
Apparently the host overloads are still around when we reach CodeGen, even if they're not being called from the kernels. So we need to handle that here one way or another. It differs from regular overloading because we're overloading on an attribute so both functions have identical mangling. While at the AST level we're calling the right overload, when we reach CodeGen it mostly relies on mangling to connect the functions to their uses, which means that we lose the information of which overload was meant to be called and emitting either one can work.
So without this special handling CodeGen will just emit one or the other overload based on which one it processes first.
For example in my test sample, without this special handling, this:
Ends up emitting the
sycl_device_only
overload, but this:Ends up emitting the "host" overload.
Adding
sycl_device
in the mix just makes the problem more obvious because it forces the compiler to try to actually emit both overloads which results in an error, but even without it the issue is there.I'm sure there might be other, potentially cleaner ways of handling this, but I suspect they might require larger changes and be less contained. I'm also not seeing any obvious compilation performance issues, so I think we should be okay to go with this, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for that analysis @npmiller! I went to open an issue to follow up on this, but then I saw that we already do effectively the same thing for SYCL-CUDA
llvm/clang/lib/CodeGen/CodeGenModule.cpp
Lines 4532 to 4562 in 4f0afd5
Based on that, I'm inclined to not worry about this; particularly since I don't see a strategy to do better. However, two things I'd like to check first to ensure there aren't additional issues:
Manglings
,MangledDeclNames
, andDeferredDecls
will remain consistent in that lookup for a mangled name in each will return a consistent set of declarations for the right entity. Does that sound right?EmitGlobal()
is called for the device-only definition. Can you try the following test case to ensure a symbol conflict isn't reported and that bothg()
andh()
return 2?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be fair the SYCL for CUDA support doing something similar but it's pretty hacky itself and it's also guarded behind a flag so it's a bit more contained than this patch (I believe without the sycl-cuda compat flag
LangOpts.isSYCL() && LangOpts.CUDA
would never be true, even when compiling SYCL for Nvidia targets).I think a possible option to do better would be to hook the attribute into the function multiversioning support, I only noticed this recently but it seems pretty similar to what we're trying to do here. It might be a fair bit of work though as it looks like it has a lot of special handling in both Sema and CodeGen.
-fsycl -fsycl-device-only -Xclang -disable-llvm-passes -S -emit-llvm
):There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Function multiversioning depends on symbol mangling to differentiate the target functions with selection of which function to call decided at run-time. Unfortunately, I don't think it will help with this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having another look at (1) I think it still sounds right, and since multi-versioning doesn't apply I think we should be good with this solution, thanks for the reviews and discussion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Thanks for following up!