-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor C++ language extensions and C++ support #3673
base: docs/develop
Are you sure you want to change the base?
Conversation
e911a63
to
da93aea
Compare
0c52420
to
31fa125
Compare
760259e
to
6d6c430
Compare
058fdde
to
3eddfa4
Compare
3eddfa4
to
c485169
Compare
21b4cb3
to
021e579
Compare
@MKKnorr Please rebase on docs/develop. |
ac2b239
to
761c1f8
Compare
58960c9
to
4b37cf5
Compare
4b37cf5
to
0cf8414
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ok to me. I left a few comments.
HIP supports ``__threadfence()``, ``__threadfence_block()`` and | ||
``__threadfence_system()``. | ||
|
||
On AMD devices, ``__threadfence_system()``, has restrictions and therefore needs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like it is referring to some implicit knowledge the user must have relative to threadfence_system? What are the "restrictions"? And is it appropriate to call it a workaround in the docs? Seems like a workaround is good for an issue or bug report, but not for user docs. Perhaps "requires the following process"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish I had more information on this, but I couldn't find information proving or disproving this, so I just reformatted the already existing content
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requires the following process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll ask around what the current status on this feature is, maybe we can get rid of this section altogether, but I'll postpone it for now
ef46f97
to
00a1e34
Compare
00a1e34
to
b91f075
Compare
b91f075
to
4ffc403
Compare
4ffc403
to
5cc8184
Compare
|
||
.. code-block:: cpp | ||
|
||
#include <hip/hip_runtime.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we modify this to add host and device functions as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example is specifically about launching a kernel and I'd like to keep it as concise as possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense, but this kind of felt like a build up, there is description of host, device and global functions but example only shows global function. I think user reading through it, when they reach this section should get a complete picture of all three in action.
I'll leave it up to you, but if I was reading this, I wish there would be a more complete example which encompasses all that I just read.
|
||
GPU multiprocessors have a fixed pool of resources (primarily registers and | ||
shared memory) which are shared by the actively running warps. Using more | ||
resources can increase IPC of the kernel but reduces the resources available for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can increase IPC of the kernel
What does IPC mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instructions per cycle. Rephrased the sentence to replace the acronym
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use such words (IPC incase of kernels?). Can we measure IPC for kernels? I would recommend using something like 'occupancy' which is a measurable metric.
automatically reduce shared memory usage. The compilation fails, if the compiler | ||
can not generate code that satisfies the launch bounds. | ||
|
||
On NVCC this parameter maps to the ``.maxntid`` PTX directive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are mentioning PTX directive, should we also mention the amdgpu_flat_work_group_size
function attribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but this also is not the appropriate place in general. I already opened an issue to move this to the compiler documentation, as this does not directly relate to the language
Porting from CUDA __launch_bounds | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
CUDA defines the ``__launch_bounds`` qualifier which works similar to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does CUDA support __launch_bounds (without the trailing underscore?) can you point me to the CUDA documentation for it, cant seem to find it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could have sworn I've seen it somewhere, but it seems you're right, it also has trailing underscores. Fixed it.
Managed memory is a special qualifier, that makes the marked memory available on | ||
the device and on the host. For more details see :ref:`unified_memory`. | ||
|
||
__restrict__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isnt restrict more of a clang extension than HIP, should we include it in HIP documentation? or alteast mention that here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is somewhat of a language extension, but also compiler specific/a hint to the compiler. However since CUDA requires this, I'm quite sure HIP also has to support it, so I'd say it has its place in this documentation
reduced support of the C++ standard library in device code, as functions are | ||
only compiled for the host by default. There are ongoing efforts to implement | ||
C++ standard library functionality with `libhipcxx | ||
<https://github.com/ROCm/libhipcxx>`_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something worth mentioning here, that STL features which are guaranteed to resolve at compile time (for example type_traits) are supported on device code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is also noted under C++11 support - constexpr, but I added an additional note here.
5cc8184
to
7ee5328
Compare
7ee5328
to
92410f6
Compare
92410f6
to
e9a6c15
Compare
No description provided.