NOTE: these exercises have been tested on MI210 and MI300A accelerators using a container environment.
To see details on the container environment (such as operating system and modules available) please see
on this repo.
cd $HOME/HPCTraining/Examples
git clone Kokkos_build
cd Kokkos_build
Build Kokkos with OpenMP backend
mkdir build_openmp && cd build_openmp
make -j 8
make install
cd ..
Build Kokkos with HIP backend
mkdir build_hip && cd build_hip
make -j 8; make install
cd ..
Set Kokkos_DIR to point to external Kokkos package to use
export Kokkos_DIR=${HOME}/Kokkos_HIP
Get example
git clone --recursive Chapter13
cd Chapter13/Kokkos/StreamTriad
cd Orig
Test serial version with
mkdir build && cd build; cmake ..; make; ./StreamTriad
If the run fails (SEGV), try reducing the size of the arrays, by reducing the value of the nsize variable in
Add to CMakeLists.txt
(add) find_package(Kokkos REQUIRED)
add_executables(StreamTriad ....)
(add) target_link_libraries(StreamTriad Kokkos::kokkos)
Retest with
cmake ..; make
and run ./StreamTriad again
Check Ver1 for solution. These modifications have already been made in Ver1 version.
(peek at ver4/ to see the end result)
Add include file
#include <Kokkos_Core.hpp>
Add initialize and finalize
Kokkos::initialize(argc, argv); {
} Kokkos::finalize();
Replace static array declarations with Kokkos views
int nsize=80000000;
Kokkos::View<double *> a( "a", nsize);
Kokkos::View<double *> b( "b", nsize);
Kokkos::View<double *> c( "c", nsize);
Rebuild and run
CXX=hipcc cmake ..
Change for loops to Kokkos parallel fors.
At start of loop
Kokkos::parallel_for(nsize, KOKKOS_LAMBDA (int i) {
At end of loop, replace closing brace with
Rebuild and run. Add environment variables as Kokkos message suggests:
export OMP_PROC_BIND=spread
export OMP_PLACES=threads
export OMP_PROC_BIND=true
How much speedup do you observe?
Add Kokkos calls
Kokkos::Timer timer;
timer.reset(); // for timer start
time_sum += timer.seconds();
#include <timer.h>
struct timespec tstart;
time_sum += cpu_timer_stop(tstart);
Find out how many virtual cores are on your CPU
First run with a single processor:
Average runtime ___________
Then run the OpenMP version:
Average runtime ___________
- Rebuild Stream Triad using Kokkos build with HIP
Set Kokkos_DIR to point to external Kokkos build with HIP
export Kokkos_DIR=${HOME}/Kokkos_HIP/lib/cmake/Kokkos_HIP
cmake ..
- Run and measure performance with AMD Radeon GPUs
HIP build with ROCm
Ver4 - Average runtime is ______ msecs