-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory overhead for HIP backends on MI300A GPUs #1734
base: main
Are you sure you want to change the base?
Conversation
@@ -114,6 +107,7 @@ static int CeedBasisApplyAtPointsCore_Hip(CeedBasis basis, bool apply_add, const | |||
const CeedScalar *d_x, *d_u; | |||
CeedScalar *d_v; | |||
CeedBasis_Hip *data; | |||
Ceed_Hip *hip_data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another stray
@@ -126,6 +120,7 @@ static int CeedBasisApplyAtPointsCore_Hip(CeedBasis basis, bool apply_add, const | |||
} | |||
|
|||
CeedCallBackend(CeedBasisGetCeed(basis, &ceed)); | |||
CeedCallBackend(CeedGetData(ceed, &hip_data)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here
9dd589f
to
35926f8
Compare
35926f8
to
5ad657d
Compare
CeedVector_Hip *impl; | ||
|
||
CeedCallBackend(CeedVectorGetData(vec, &impl)); | ||
CeedCallHip(CeedVectorReturnCeed(vec), hipDeviceSynchronize()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ratel seems to work fine without this line, and is faster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does CeedVectorSyncArray
mean that one could immediately start an MPI_Send
? If the host doesn't know that the previous kernel (writing to the array) has completed, then it would be racy to call MPI_Send
. (Might be rare to trip, but we don't want that kind of bug.)
If our sends are using a kernel for packing (on the same stream), then the host doesn't need to know when the earlier stuff completes, but we still need to sync after the packing kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a fair point, I think that we need to be a bit more careful and only sync when the host needs the data. Otherwise this acts as a hard sync with the GPU, which seems to have performance impacts.
FYI, generally prefer rebase to merge for dev branches. It doesn't matter for squash-merges (the commit history gets nuked anyways), but for normal merges it helps the git history be more regular. |
Yeah generally I agree - I need to strip down this branch and rebuild it probably, it's currently a mess due to changes at the AMD workshop. |
Prevents double allocations for
CeedVector
when using HIP vector with unified addressing and XNACK.Also, updates more of the HIP vector operations to use
hipBLAS
functions rather than custom kernels.