Deploy QwQ-32B with 0.4.3.post4 cannot find curand_kernel.h #4169

YosanHo · 2025-03-07T08:02:55Z

deploy QwQ-32B in 2*H20 with 0.4.3.post4 occur cannot find file error

parameters:

python -m sglang.launch_server --model-path ./Qwen/QwQ-32B --trust-remote-code --host 0.0.0.0 --port 8080 --served-model-name QwQ-32B --enable-metrics --mem-fraction-static 0.90 --tp 2 --context-length 16384 --reasoning-parser deepseek-r1

error log:

[2025-03-07 15:54:04 TP1] TpModelWorkerClient hit an exception: Traceback (most recent call last):
  File "/opt/app/python3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2104, in _run_ninja_build
    subprocess.run(
  File "/opt/app/python3.10/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/app/python3.10/lib/python3.10/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 109, in forward_thread_func
    self.forward_thread_func_()
  File "/opt/app/python3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/opt/app/python3.10/lib/python3.10/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 140, in forward_thread_func_
    logits_output, next_token_ids = self.worker.forward_batch_generation(
  File "/opt/app/python3.10/lib/python3.10/site-packages/sglang/srt/managers/tp_worker.py", line 172, in forward_batch_generation
    logits_output = self.model_runner.forward(forward_batch)
  File "/opt/app/python3.10/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 909, in forward
    return self.forward_extend(
  File "/opt/app/python3.10/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 866, in forward_extend
    self.attn_backend.init_forward_metadata(forward_batch)
  File "/opt/app/python3.10/lib/python3.10/site-packages/sglang/srt/layers/attention/flashinfer_backend.py", line 237, in init_forward_metadata
    self.indices_updater_prefill.update(
  File "/opt/app/python3.10/lib/python3.10/site-packages/sglang/srt/layers/attention/flashinfer_backend.py", line 740, in update_single_wrapper
    self.call_begin_forward(
  File "/opt/app/python3.10/lib/python3.10/site-packages/sglang/srt/layers/attention/flashinfer_backend.py", line 887, in call_begin_forward
    wrapper_ragged.begin_forward(
  File "/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/prefill.py", line 2156, in plan
    self._cached_module = get_batch_prefill_module(self._backend)(
  File "/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/prefill.py", line 197, in backend_module
    module = gen_batch_prefill_module(backend, *args)
  File "/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/jit/attention/pytorch.py", line 563, in gen_batch_prefill_module
    return gen_customize_batch_prefill_module(
  File "/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/jit/attention/pytorch.py", line 1078, in gen_customize_batch_prefill_module
    return load_cuda_ops(
  File "/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/jit/core.py", line 123, in load_cuda_ops
    torch_cpp_ext.load(
  File "/opt/app/python3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1314, in load
    return _jit_compile(
  File "/opt/app/python3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1721, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/opt/app/python3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1833, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/opt/app/python3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2120, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90': [1/9] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_sm90_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/csrc -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/TH -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/app/python3.10/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -std=c++17 --threads 4 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -gencode=arch=compute_90a,code=sm_90a -c /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cu -o batch_prefill_sm90_jit_pybind.cuda.o
FAILED: batch_prefill_sm90_jit_pybind.cuda.o
/usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_sm90_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/csrc -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/TH -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/THC -isystem/usr/local/cuda/include -isystem /opt/app/python3.10/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -std=c++17 --threads 4 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -gencode=arch=compute_90a,code=sm_90a -c /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cu -o batch_prefill_sm90_jit_pybind.cuda.o
In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/cutlass_utils.cuh:40,
                 from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_config.inc:6,
                 from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cu:16:
/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory
   52 | #include <curand_kernel.h>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.
In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/cutlass_utils.cuh:40,
                 from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_config.inc:6,
                 from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cu:16:
/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory
   52 | #include <curand_kernel.h>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.
In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/cutlass_utils.cuh:40,
                 from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_config.inc:6,
                 from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cu:16:
/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory
   52 | #include <curand_kernel.h>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.
fatal   : Could not open input file /tmp/tmpxft_00000510_00000000-8_batch_prefill_sm90_jit_pybind.compute_90.cpp1.ii
[2/9] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_ragged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/csrc -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/TH -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/app/python3.10/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -std=c++17 --threads 4 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -gencode=arch=compute_90a,code=sm_90a -c /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cu -o batch_prefill_ragged_sm90_kernel_mask_0.cuda.o
FAILED: batch_prefill_ragged_sm90_kernel_mask_0.cuda.o
/usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_ragged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/csrc -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/TH -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/app/python3.10/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -std=c++17 --threads 4 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -gencode=arch=compute_90a,code=sm_90a -c /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cu -o batch_prefill_ragged_sm90_kernel_mask_0.cuda.o
In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40,
                 from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21,
                 from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cu:1:
/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory
   52 | #include <curand_kernel.h>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.
In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40,
                 from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21,
                 from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cu:1:
/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory
   52 | #include <curand_kernel.h>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.
In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40,
                 from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21,
                 from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cu:1:
/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory
   52 | #include <curand_kernel.h>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.
fatal   : Could not open input file /tmp/tmpxft_0000050a_00000000-8_batch_prefill_ragged_sm90_kernel_mask_0.compute_90.cpp1.ii
[3/9] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_ragged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/csrc -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/TH -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/app/python3.10/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -std=c++17 --threads 4 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -gencode=arch=compute_90a,code=sm_90a -c /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cu -o batch_prefill_ragged_sm90_kernel_mask_2.cuda.o
FAILED: batch_prefill_ragged_sm90_kernel_mask_2.cuda.o
/usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_ragged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/csrc -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/TH -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/app/python3.10/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -std=c++17 --threads 4 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -gencode=arch=compute_90a,code=sm_90a -c /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cu -o batch_prefill_ragged_sm90_kernel_mask_2.cuda.o
In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40,
                 from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21,
                 from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cu:1:
/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory
   52 | #include <curand_kernel.h>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.
In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40,
                 from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21,
                 from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cu:1:
/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory
   52 | #include <curand_kernel.h>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.
In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40,
                 from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21,
                 from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cu:1:
/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory
   52 | #include <curand_kernel.h>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.
fatal   : Could not open input file /tmp/tmpxft_0000050e_00000000-8_batch_prefill_ragged_sm90_kernel_mask_2.compute_90.cpp1.ii
[4/9] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_paged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/csrc -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/TH -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/app/python3.10/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -std=c++17 --threads 4 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -gencode=arch=compute_90a,code=sm_90a -c /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cu -o batch_prefill_paged_sm90_kernel_mask_1.cuda.o
FAILED: batch_prefill_paged_sm90_kernel_mask_1.cuda.o
/usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_paged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/csrc -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/TH -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/THC-isystem /usr/local/cuda/include -isystem /opt/app/python3.10/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -std=c++17 --threads 4 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -gencode=arch=compute_90a,code=sm_90a -c /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cu -o batch_prefill_paged_sm90_kernel_mask_1.cuda.o
In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40,
                 from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21,
                 from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cu:1:
/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory
   52 | #include <curand_kernel.h>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.
In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40,
                 from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21,
                 from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cu:1:
/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory
   52 | #include <curand_kernel.h>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.
In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40,
                 from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21,
                 from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cu:1:
/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory
   52 | #include <curand_kernel.h>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.
fatal   : Could not open input file /tmp/tmpxft_0000050b_00000000-8_batch_prefill_paged_sm90_kernel_mask_1.compute_90.cpp1.ii
[5/9] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_ragged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/csrc -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/TH -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/app/python3.10/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -std=c++17 --threads 4 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -gencode=arch=compute_90a,code=sm_90a -c /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cu -o batch_prefill_ragged_sm90_kernel_mask_1.cuda.o
FAILED: batch_prefill_ragged_sm90_kernel_mask_1.cuda.o
/usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_ragged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/csrc -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/TH -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/app/python3.10/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -std=c++17 --threads 4 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -gencode=arch=compute_90a,code=sm_90a -c /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cu -o batch_prefill_ragged_sm90_kernel_mask_1.cuda.o
In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40,
                 from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21,
                 from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cu:1:
/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory
   52 | #include <curand_kernel.h>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.
In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40,
                 from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21,
                 from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cu:1:
/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory
   52 | #include <curand_kernel.h>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.
In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40,
                 from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21,
                 from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cu:1:
/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory
   52 | #include <curand_kernel.h>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.
fatal   : Could not open input file /tmp/tmpxft_0000050c_00000000-8_batch_prefill_ragged_sm90_kernel_mask_1.compute_90.cpp1.ii
[6/9] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_paged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/csrc -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/TH -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/app/python3.10/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -std=c++17 --threads 4 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -gencode=arch=compute_90a,code=sm_90a -c /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cu -o batch_prefill_paged_sm90_kernel_mask_2.cuda.o
FAILED: batch_prefill_paged_sm90_kernel_mask_2.cuda.o
/usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_paged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/csrc -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/TH -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/THC-isystem /usr/local/cuda/include -isystem /opt/app/python3.10/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -std=c++17 --threads 4 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -gencode=arch=compute_90a,code=sm_90a -c /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cu -o batch_prefill_paged_sm90_kernel_mask_2.cuda.o
In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40,
                 from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21,
                 from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cu:1:
/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory
   52 | #include <curand_kernel.h>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.
In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40,
                 from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21,
                 from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cu:1:
/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory
   52 | #include <curand_kernel.h>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.
In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40,
                 from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21,
                 from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cu:1:
/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory
   52 | #include <curand_kernel.h>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.
fatal   : Could not open input file /tmp/tmpxft_0000050d_00000000-8_batch_prefill_paged_sm90_kernel_mask_2.compute_90.cpp1.ii
[7/9] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_paged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/csrc -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/TH -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/app/python3.10/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -std=c++17 --threads 4 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -gencode=arch=compute_90a,code=sm_90a -c /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cu -o batch_prefill_paged_sm90_kernel_mask_0.cuda.o
FAILED: batch_prefill_paged_sm90_kernel_mask_0.cuda.o
/usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_paged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/csrc -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/TH -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/THC-isystem /usr/local/cuda/include -isystem /opt/app/python3.10/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -std=c++17 --threads 4 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -gencode=arch=compute_90a,code=sm_90a -c /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cu -o batch_prefill_paged_sm90_kernel_mask_0.cuda.o
In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40,
                 from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21,
                 from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cu:1:
/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory
   52 | #include <curand_kernel.h>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.
In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40,
                 from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21,
                 from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cu:1:
/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory
   52 | #include <curand_kernel.h>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.
In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40,
                 from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21,
                 from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cu:1:
/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory
   52 | #include <curand_kernel.h>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.
fatal   : Could not open input file /tmp/tmpxft_00000509_00000000-8_batch_prefill_paged_sm90_kernel_mask_0.compute_90.cpp1.ii
[8/9] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_sm90.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/csrc -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/TH -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/app/python3.10/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -std=c++17 --threads 4 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -gencode=arch=compute_90a,code=sm_90a -c /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu -o batch_prefill_sm90.cuda.o
FAILED: batch_prefill_sm90.cuda.o
/usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_sm90.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/csrc -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/TH -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/app/python3.10/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -std=c++17 --threads 4 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -gencode=arch=compute_90a,code=sm_90a -c /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu -o batch_prefill_sm90.cuda.o
In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/cutlass_utils.cuh:40,
                 from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_config.inc:6,
                 from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu:26:
/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory
   52 | #include <curand_kernel.h>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.
In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/cutlass_utils.cuh:40,
                 from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_config.inc:6,
                 from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu:26:
/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory
   52 | #include <curand_kernel.h>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.
In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/cutlass_utils.cuh:40,
                 from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_config.inc:6,
                 from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu:26:
/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory
   52 | #include <curand_kernel.h>
      |          ^~~~~~~~~~~~~~~~~
compilation terminated.
fatal   : Could not open input file /tmp/tmpxft_0000050f_00000000-8_batch_prefill_sm90.compute_90.cpp1.ii
ninja: build stopped: subcommand failed.

The text was updated successfully, but these errors were encountered:

zhyncs · 2025-03-07T08:21:07Z

@yzh119

YosanHo closed this as completed Mar 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deploy QwQ-32B with 0.4.3.post4 cannot find curand_kernel.h #4169

Deploy QwQ-32B with 0.4.3.post4 cannot find curand_kernel.h #4169

YosanHo commented Mar 7, 2025

zhyncs commented Mar 7, 2025

Deploy QwQ-32B with 0.4.3.post4 cannot find curand_kernel.h #4169

Deploy QwQ-32B with 0.4.3.post4 cannot find curand_kernel.h #4169

Comments

YosanHo commented Mar 7, 2025

zhyncs commented Mar 7, 2025