NOTE: these exercises have been tested on MI210 and MI300A accelerators using a container environment.
To see details on the container environment (such as operating system and modules available) please see README.md
on this repo.
cd $HOME/HPCTraining/Examples
git clone https://github.com/kokkos/kokkos Kokkos_build
cd Kokkos_build
Build Kokkos with OpenMP backend
mkdir build_openmp && cd build_openmp
cmake -DCMAKE_INSTALL_PREFIX=${HOME}/Kokkos_OpenMP -DKokkos_ENABLE_SERIAL=On \
-DKokkos_ENABLE_OPENMP=On ..
make -j 8
make install
cd ..
Build Kokkos with HIP backend
mkdir build_hip && cd build_hip
cmake -DCMAKE_INSTALL_PREFIX=${HOME}/Kokkos_HIP -DKokkos_ENABLE_SERIAL=ON \
-DKokkos_ENABLE_HIP=ON -DKokkos_ARCH_ZEN=ON -DKokkos_ARCH_VEGA90A=ON \
-DCMAKE_CXX_COMPILER=hipcc ..
make -j 8; make install
cd ..
Set Kokkos_DIR to point to external Kokkos package to use
export Kokkos_DIR=${HOME}/Kokkos_HIP
Get example
git clone --recursive https://github.com/EssentialsOfParallelComputing/Chapter13 Chapter13
cd Chapter13/Kokkos/StreamTriad
cd Orig
Test serial version with
mkdir build && cd build; cmake ..; make; ./StreamTriad
If the run fails (SEGV), try reducing the size of the arrays, by reducing the value of the nsize variable in StreamTriad.cc.
Add to CMakeLists.txt
(add) find_package(Kokkos REQUIRED)
add_executables(StreamTriad ....)
(add) target_link_libraries(StreamTriad Kokkos::kokkos)
Retest with
cmake ..; make
and run ./StreamTriad again
Check Ver1 for solution. These modifications have already been made in Ver1 version.
(peek at ver4/StreamTriad.cc to see the end result)
Add include file
#include <Kokkos_Core.hpp>
Add initialize and finalize
Kokkos::initialize(argc, argv); {
} Kokkos::finalize();
Replace static array declarations with Kokkos views
int nsize=80000000;
Kokkos::View<double *> a( "a", nsize);
Kokkos::View<double *> b( "b", nsize);
Kokkos::View<double *> c( "c", nsize);
Rebuild and run
CXX=hipcc cmake ..
make
./StreamTriad
Change for loops to Kokkos parallel fors.
At start of loop
Kokkos::parallel_for(nsize, KOKKOS_LAMBDA (int i) {
At end of loop, replace closing brace with
});
Rebuild and run. Add environment variables as Kokkos message suggests:
export OMP_PROC_BIND=spread
export OMP_PLACES=threads
export OMP_PROC_BIND=true
How much speedup do you observe?
Add Kokkos calls
Kokkos::Timer timer;
timer.reset(); // for timer start
time_sum += timer.seconds();
Remove
#include <timer.h>
struct timespec tstart;
cpu_timer_start(&tstart);
time_sum += cpu_timer_stop(tstart);
Find out how many virtual cores are on your CPU
lscpu
First run with a single processor:
Average runtime ___________
Then run the OpenMP version:
Average runtime ___________
- Rebuild Stream Triad using Kokkos build with HIP
Set Kokkos_DIR to point to external Kokkos build with HIP
export Kokkos_DIR=${HOME}/Kokkos_HIP/lib/cmake/Kokkos_HIP
cmake ..
make
- Run and measure performance with AMD Radeon GPUs
HIP build with ROCm
Ver4 - Average runtime is ______ msecs