[CUDA] [SYCL} --fast-math causes nvptx codegen error and/or failed llvm link

**Describe the bug**
The -ffast-math switch results in backend failures and/or llvm link failures when using some double precision std::math intrinsics.
This was discovered compiling LAMMPS with Kokkos using SYCL for CUDA.

**To Reproduce**
```
ogin29:jack$ cat exp.cpp
#include <iostream>
#include <cstdlib>
#include <sycl/sycl.hpp>
using namespace sycl;
#include <cmath>

// Create an exception handler for asynchronous SYCL exceptions
static auto exception_handler = [](sycl::exception_list e_list) {
  for (std::exception_ptr const &e : e_list) {
    try {
      std::rethrow_exception(e);
    }
    catch (std::exception const &e) {
#if _DEBUG
      std::cout << "Failure" << std::endl;
#endif
      std::terminate();
    }
  }
};

int
main()
{
  auto d_selector{default_selector_v};
  const int N = 1024;
  std::vector<double> in(N), out(N);
  std::srand(1234);
  for (int i = 0; i < N; ++i)
    in[i] = std::rand() / (double) RAND_MAX;
  double *d_in = in.data(), *d_out = out.data();
  queue q(d_selector, exception_handler);
  range num_items{N};
  auto e =
    q.parallel_for(num_items, [=](auto i) {
      //d_out[i] = std::exp(std::sin(d_in[i]) + std::cos(d_in[i]));
      d_out[i] = std::sin(d_in[i]) + std::cos(d_in[i]);
    });
  e.wait();
  std::cout << out[0];
}
=====
clang++ -fsycl -fsycl-targets=nvptx64-nvidia-cuda -Xsycl-target-backend --cuda-gpu-arch=sm_80 --cuda-path=$CUDATOOLKIT_HOME exp.cpp -o exp -ffast-math
clang-16: warning: CUDA version 11.7 is only partially supported [-Wunknown-cuda-version]
fatal error: error in backend: Cannot select: t11: f64 = fsin nnan ninf nsz arcp contract afn reassoc t10
  t10: f64,ch = load<(load (s64) from %ir.arrayidx.i.i, !tbaa !65, addrspace 1)> t0, t7, undef:i64
    t7: i64 = add t2, t6
      t2: i64,ch = CopyFromReg t0, Register:i64 %1
        t1: i64 = Register %1
      t6: i64 = shl t4, Constant:i32<3>
        t4: i64,ch = CopyFromReg t0, Register:i64 %2
          t3: i64 = Register %2
        t19: i32 = Constant<3>
    t9: i64 = undef
In function: _ZTSN4sycl3_V16detail18RoundedRangeKernelINS0_4itemILi1ELb1EEELi1EZ4mainEUlT_E_EE
llvm-foreach:
clang-16: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 16.0.0 (https://github.com/intel/llvm d164fd946341dba28d6759aa2938161ce0e83647)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /pscratch/sd/l/lfmeadow/llvm-build/install/bin
clang-16: note: diagnostic msg: Error generating preprocessed source(s).
```
If std::exp is called then a different message appears.

**Environment (please complete the following information):**

linux nvidia A100 CUDATOOLKIT_HOME=/opt/nvidia/hpc_sdk/Linux_x86_64/22.5/cuda/11.7 on Perlmutter.

**Additional context**
Add any other context about the problem here.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CUDA] [SYCL} --fast-math causes nvptx codegen error and/or failed llvm link #7954

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[CUDA] [SYCL} --fast-math causes nvptx codegen error and/or failed llvm link #7954

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions