Skip to content
/ ndzip Public
forked from celerity/ndzip

A High-Throughput Parallel Lossless Compressor for Scientific Data

License

Notifications You must be signed in to change notification settings

jtian0/ndzip

This branch is up to date with celerity/ndzip:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ff4e670 · Jun 15, 2022
Jun 13, 2022
Jun 13, 2022
Jul 12, 2021
Jun 13, 2022
Jul 29, 2021
Jul 29, 2021
Jun 1, 2022
Jun 13, 2022
Oct 20, 2021
Oct 20, 2021
Jul 28, 2020
May 19, 2021
Jun 13, 2022
Jun 2, 2021
Jun 15, 2022

Repository files navigation

ndzip: A High-Throughput Parallel Lossless Compressor for Scientific Data

ndzip compresses and decompresses multidimensional univariate grids of single- and double-precision IEEE 754 floating-point data. We implement

  • a single-threaded CPU compressor
  • an OpenMP-backed multi-threaded compressor
  • a SYCL-based GPU compressor (currently hipSYCL + NVIDIA only)
  • a CUDA-based GPU compressor

All variants generate and decode bit-identical compressed stream.

ndzip is currently a research project with the primary use case of speeding up distributed HPC applications by increasing effective interconnect bandwidth.

Prerequisites

  • CMake >= 3.15
  • Clang >= 10.0.0
  • Linux (tested on x86_64 and POWER9)
  • Boost >= 1.66
  • Catch2 >= 2.13.3 (optional, for unit tests and microbenchmarks)

Additionaly, for GPU support

  • CUDA >= 11.0 (not officially compatible with Clang 10/11, but a lower version will optimize insufficiently!)
  • An Nvidia GPU with Compute Capability >= 6.1
  • For the SYCL version: hipSYCL >= 8756087f

Building

Make sure to set the right build type and enable the full instruction set of the target CPU architecture:

-DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-march=native"

If unit tests and microbenchmarks should also be built, add

-DNDZIP_BUILD_TEST=YES

Depending on your system, you might have to configure the correct C/C++ compilers to use (Clang >= 10.0 and GCC >= 8.2 have been known to work in the past):

-DCMAKE_C_COMPILER=/path/to/cc -DCMAKE_CXX_COMPILER=/path/to/c++

For GPU support with SYCL

  1. Build and install hipSYCL
git clone https://github.com/illuhad/hipSYCL
cd hipSYCL
cmake -B build -DCMAKE_INSTALL_PREFIX=../hipSYCL-install -DWITH_CUDA_BACKEND=YES -DCMAKE_BUILD_TYPE=Release
cmake --build build --target install -j
  1. Build ndzip with SYCL
cmake -B build -DCMAKE_PREFIX_PATH='../hipSYCL-install/lib/cmake' -DHIPSYCL_PLATFORM=cuda -DCMAKE_CUDA_ARCHITECTURES=75 -DHIPSYCL_GPU_ARCH=sm_75 -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-U__FLOAT128__ -U__SIZEOF_FLOAT128__ -march=native"
cmake --build build -j

Replace sm_75 and 75 with the string matching your GPU's Compute Capability. The -U__FLOAT128__ define is required due to an open bug in Clang.

For GPU support with CUDA (experimental)

a) Either build ndzip with CUDA + NVCC ...

cmake -B build -DCMAKE_CUDA_ARCHITECTURES=75 -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-march=native"
cmake --build build -j

Replace sm_75 and 75 with the string matching your GPU's Compute Capability.

If CMAKE_CXX_COMPILER was redefined above, you also need to specify the CUDA host compiler:

-DCMAKE_CUDA_HOST_COMPILER=/path/to/c++

b) ... or with CUDA + Clang

cmake -B build -DCMAKE_CUDA_COMPILER="$(which clang++)" -DCMAKE_CUDA_ARCHITECTURES=75 -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-U__FLOAT128__ -U__SIZEOF_FLOAT128__ -march=native"
cmake --build build -j

The -U__FLOAT128__ define is required due to an open bug in Clang.

Compressing and decompressing files

build/compress -n <size> -i <uncompressed-file> -o <compressed-file> [-t float|double]
build/compress -d -n <size> -i <compressed-file> -o <decompressed-file> [-t float|double]

<size> are one to three arguments depending on the dimensionality of the input grid. In the multi-dimensional case, the first number specifies the width of the slowest-iterating dimension.

By default, compress uses the single-threaded CPU compressor. Passing -e cpu-mt or -e sycl / -e cuda selects the multi-threaded CPU compressor or the GPU compressor if available, respectively.

Running unit tests

Only available if tests have been enabled during build.

build/encoder_test
build/sycl_bits_test  # only if built with SYCL support
build/sycl_ubench     # GPU microbenchmarks, only if built with SYCL support
build/cuda_bits_test  # only if built with CUDA support

See also

References

If you are using ndzip as part of your research, we kindly ask you to cite

  • Fabian Knorr, Peter Thoman, and Thomas Fahringer. "ndzip: A High-Throughput Parallel Lossless Compressor for Scientific Data". In 2021 Data Compression Conference (DCC), IEEE, 2021. [DOI] Preprint PDF

  • Knorr, Fabian, Peter Thoman, and Thomas Fahringer. "ndzip-gpu: efficient lossless compression of scientific floating-point data on GPUs". In SC'21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, 2021. [DOI] [Preprint PDF]

About

A High-Throughput Parallel Lossless Compressor for Scientific Data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 70.2%
  • Cuda 15.1%
  • C 7.9%
  • CMake 3.7%
  • Python 2.9%
  • Shell 0.2%