Reduce memory overhead for HIP backends on MI300A GPUs #1734

zatkins-dev · 2025-01-24T23:08:07Z

Prevents double allocations for CeedVector when using HIP vector with unified addressing and XNACK.

Also, updates more of the HIP vector operations to use hipBLAS functions rather than custom kernels.

backends/hip-ref/ceed-hip-ref-basis.c

backends/hip/ceed-hip-common.h

backends/hip/ceed-hip-compile.cpp

backends/hip-ref/ceed-hip-ref-basis.c

backends/hip/ceed-hip-compile.cpp

jeremylt · 2025-02-11T19:30:02Z

backends/hip-ref/ceed-hip-ref-basis.c

@@ -114,6 +107,7 @@ static int CeedBasisApplyAtPointsCore_Hip(CeedBasis basis, bool apply_add, const
  const CeedScalar *d_x, *d_u;
  CeedScalar       *d_v;
  CeedBasis_Hip    *data;
+  Ceed_Hip         *hip_data;


another stray

jeremylt · 2025-02-11T19:30:04Z

backends/hip-ref/ceed-hip-ref-basis.c

@@ -126,6 +120,7 @@ static int CeedBasisApplyAtPointsCore_Hip(CeedBasis basis, bool apply_add, const
  }

  CeedCallBackend(CeedBasisGetCeed(basis, &ceed));
+  CeedCallBackend(CeedGetData(ceed, &hip_data));


…unified addressing and XNACK

zatkins-dev · 2025-02-13T22:34:02Z

backends/hip-ref/ceed-hip-ref-vector.c

+    CeedVector_Hip *impl;
+
+    CeedCallBackend(CeedVectorGetData(vec, &impl));
+    CeedCallHip(CeedVectorReturnCeed(vec), hipDeviceSynchronize());


Ratel seems to work fine without this line, and is faster

Does CeedVectorSyncArray mean that one could immediately start an MPI_Send? If the host doesn't know that the previous kernel (writing to the array) has completed, then it would be racy to call MPI_Send. (Might be rare to trip, but we don't want that kind of bug.)

If our sends are using a kernel for packing (on the same stream), then the host doesn't need to know when the earlier stuff completes, but we still need to sync after the packing kernel.

that's a fair point, I think that we need to be a bit more careful and only sync when the host needs the data. Otherwise this acts as a hard sync with the GPU, which seems to have performance impacts.

jrwrigh · 2025-02-26T01:23:27Z

FYI, generally prefer rebase to merge for dev branches. It doesn't matter for squash-merges (the commit history gets nuked anyways), but for normal merges it helps the git history be more regular.

zatkins-dev · 2025-02-26T01:36:14Z

FYI, generally prefer rebase to merge for dev branches. It doesn't matter for squash-merges (the commit history gets nuked anyways), but for normal merges it helps the git history be more regular.

Yeah generally I agree - I need to strip down this branch and rebuild it probably, it's currently a mess due to changes at the AMD workshop.

jeremylt reviewed Jan 24, 2025

View reviewed changes

backends/hip-ref/ceed-hip-ref-basis.c Outdated Show resolved Hide resolved

jeremylt reviewed Jan 24, 2025

View reviewed changes

backends/hip/ceed-hip-common.h Show resolved Hide resolved

jeremylt reviewed Jan 24, 2025

View reviewed changes

backends/hip/ceed-hip-compile.cpp Outdated Show resolved Hide resolved

zatkins-dev added the 0-WIP label Jan 24, 2025

jeremylt reviewed Jan 27, 2025

View reviewed changes

backends/hip-ref/ceed-hip-ref-basis.c Outdated Show resolved Hide resolved

jeremylt reviewed Jan 27, 2025

View reviewed changes

backends/hip-ref/ceed-hip-ref-basis.c Outdated Show resolved Hide resolved

jeremylt reviewed Feb 11, 2025

View reviewed changes

backends/hip/ceed-hip-compile.cpp Outdated Show resolved Hide resolved

zatkins-dev self-assigned this Feb 11, 2025

jeremylt added GPU HIP labels Feb 11, 2025

jeremylt reviewed Feb 11, 2025

View reviewed changes

zatkins-dev force-pushed the zach/hip-mi300a branch from 9dd589f to 35926f8 Compare February 12, 2025 18:55

zatkins-dev added 6 commits February 12, 2025 14:25

Prevent double allocations for CeedVector when using HIP vector with …

7906003

…unified addressing and XNACK

Undo changes to HIP compile

662c76b

Add fabs to max norm

53ac46c

Change MI300A to use hipMalloc per LC tips

73bbc6b

Remove unused code

f156410

update gitignore

5ad657d

zatkins-dev force-pushed the zach/hip-mi300a branch from 35926f8 to 5ad657d Compare February 12, 2025 22:41

zatkins-dev and others added 3 commits February 12, 2025 15:54

Remove hipMemset for unified addressing

0d065eb

Merge branch 'main' into zach/hip-mi300a

659ff2f

Fix norms

d85d813

zatkins-dev commented Feb 13, 2025

View reviewed changes

Merge branch 'main' of github.com:CEED/libCEED into zach/hip-mi300a

aa38ec3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce memory overhead for HIP backends on MI300A GPUs #1734

Reduce memory overhead for HIP backends on MI300A GPUs #1734

zatkins-dev commented Jan 24, 2025 •

edited

Loading

jeremylt Feb 11, 2025

jeremylt Feb 11, 2025

zatkins-dev Feb 13, 2025

jedbrown Feb 14, 2025

zatkins-dev Feb 14, 2025

jrwrigh commented Feb 26, 2025

zatkins-dev commented Feb 26, 2025

Reduce memory overhead for HIP backends on MI300A GPUs #1734

Are you sure you want to change the base?

Reduce memory overhead for HIP backends on MI300A GPUs #1734

Conversation

zatkins-dev commented Jan 24, 2025 • edited Loading

jeremylt Feb 11, 2025

Choose a reason for hiding this comment

jeremylt Feb 11, 2025

Choose a reason for hiding this comment

zatkins-dev Feb 13, 2025

Choose a reason for hiding this comment

jedbrown Feb 14, 2025

Choose a reason for hiding this comment

zatkins-dev Feb 14, 2025

Choose a reason for hiding this comment

jrwrigh commented Feb 26, 2025

zatkins-dev commented Feb 26, 2025

zatkins-dev commented Jan 24, 2025 •

edited

Loading