Skip to content

[CUDA] [SYCL} --fast-math causes nvptx codegen error and/or failed llvm link #7954

@lfmeadow

Description

@lfmeadow

Describe the bug
The -ffast-math switch results in backend failures and/or llvm link failures when using some double precision std::math intrinsics.
This was discovered compiling LAMMPS with Kokkos using SYCL for CUDA.

To Reproduce

ogin29:jack$ cat exp.cpp
#include <iostream>
#include <cstdlib>
#include <sycl/sycl.hpp>
using namespace sycl;
#include <cmath>

// Create an exception handler for asynchronous SYCL exceptions
static auto exception_handler = [](sycl::exception_list e_list) {
  for (std::exception_ptr const &e : e_list) {
    try {
      std::rethrow_exception(e);
    }
    catch (std::exception const &e) {
#if _DEBUG
      std::cout << "Failure" << std::endl;
#endif
      std::terminate();
    }
  }
};

int
main()
{
  auto d_selector{default_selector_v};
  const int N = 1024;
  std::vector<double> in(N), out(N);
  std::srand(1234);
  for (int i = 0; i < N; ++i)
    in[i] = std::rand() / (double) RAND_MAX;
  double *d_in = in.data(), *d_out = out.data();
  queue q(d_selector, exception_handler);
  range num_items{N};
  auto e =
    q.parallel_for(num_items, [=](auto i) {
      //d_out[i] = std::exp(std::sin(d_in[i]) + std::cos(d_in[i]));
      d_out[i] = std::sin(d_in[i]) + std::cos(d_in[i]);
    });
  e.wait();
  std::cout << out[0];
}
=====
clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda -Xsycl-target-backend --cuda-gpu-arch=sm_80 --cuda-path=$CUDATOOLKIT_HOME exp.cpp -o exp -ffast-math
clang-16: warning: CUDA version 11.7 is only partially supported [-Wunknown-cuda-version]
fatal error: error in backend: Cannot select: t11: f64 = fsin nnan ninf nsz arcp contract afn reassoc t10
  t10: f64,ch = load<(load (s64) from %ir.arrayidx.i.i, !tbaa !65, addrspace 1)> t0, t7, undef:i64
    t7: i64 = add t2, t6
      t2: i64,ch = CopyFromReg t0, Register:i64 %1
        t1: i64 = Register %1
      t6: i64 = shl t4, Constant:i32<3>
        t4: i64,ch = CopyFromReg t0, Register:i64 %2
          t3: i64 = Register %2
        t19: i32 = Constant<3>
    t9: i64 = undef
In function: _ZTSN4sycl3_V16detail18RoundedRangeKernelINS0_4itemILi1ELb1EEELi1EZ4mainEUlT_E_EE
llvm-foreach:
clang-16: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 16.0.0 (https://github.com/intel/llvm d164fd946341dba28d6759aa2938161ce0e83647)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /pscratch/sd/l/lfmeadow/llvm-build/install/bin
clang-16: note: diagnostic msg: Error generating preprocessed source(s).

If std::exp is called then a different message appears.

Environment (please complete the following information):

linux nvidia A100 CUDATOOLKIT_HOME=/opt/nvidia/hpc_sdk/Linux_x86_64/22.5/cuda/11.7 on Perlmutter.

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions