We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deploy QwQ-32B in 2*H20 with 0.4.3.post4 occur cannot find file error
parameters:
python -m sglang.launch_server --model-path ./Qwen/QwQ-32B --trust-remote-code --host 0.0.0.0 --port 8080 --served-model-name QwQ-32B --enable-metrics --mem-fraction-static 0.90 --tp 2 --context-length 16384 --reasoning-parser deepseek-r1
error log:
[2025-03-07 15:54:04 TP1] TpModelWorkerClient hit an exception: Traceback (most recent call last): File "/opt/app/python3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2104, in _run_ninja_build subprocess.run( File "/opt/app/python3.10/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/opt/app/python3.10/lib/python3.10/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 109, in forward_thread_func self.forward_thread_func_() File "/opt/app/python3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "/opt/app/python3.10/lib/python3.10/site-packages/sglang/srt/managers/tp_worker_overlap_thread.py", line 140, in forward_thread_func_ logits_output, next_token_ids = self.worker.forward_batch_generation( File "/opt/app/python3.10/lib/python3.10/site-packages/sglang/srt/managers/tp_worker.py", line 172, in forward_batch_generation logits_output = self.model_runner.forward(forward_batch) File "/opt/app/python3.10/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 909, in forward return self.forward_extend( File "/opt/app/python3.10/lib/python3.10/site-packages/sglang/srt/model_executor/model_runner.py", line 866, in forward_extend self.attn_backend.init_forward_metadata(forward_batch) File "/opt/app/python3.10/lib/python3.10/site-packages/sglang/srt/layers/attention/flashinfer_backend.py", line 237, in init_forward_metadata self.indices_updater_prefill.update( File "/opt/app/python3.10/lib/python3.10/site-packages/sglang/srt/layers/attention/flashinfer_backend.py", line 740, in update_single_wrapper self.call_begin_forward( File "/opt/app/python3.10/lib/python3.10/site-packages/sglang/srt/layers/attention/flashinfer_backend.py", line 887, in call_begin_forward wrapper_ragged.begin_forward( File "/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/prefill.py", line 2156, in plan self._cached_module = get_batch_prefill_module(self._backend)( File "/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/prefill.py", line 197, in backend_module module = gen_batch_prefill_module(backend, *args) File "/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/jit/attention/pytorch.py", line 563, in gen_batch_prefill_module return gen_customize_batch_prefill_module( File "/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/jit/attention/pytorch.py", line 1078, in gen_customize_batch_prefill_module return load_cuda_ops( File "/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/jit/core.py", line 123, in load_cuda_ops torch_cpp_ext.load( File "/opt/app/python3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1314, in load return _jit_compile( File "/opt/app/python3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1721, in _jit_compile _write_ninja_file_and_build_library( File "/opt/app/python3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1833, in _write_ninja_file_and_build_library _run_ninja_build( File "/opt/app/python3.10/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2120, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error building extension 'batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90': [1/9] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_sm90_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/csrc -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/TH -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/app/python3.10/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -std=c++17 --threads 4 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -gencode=arch=compute_90a,code=sm_90a -c /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cu -o batch_prefill_sm90_jit_pybind.cuda.o FAILED: batch_prefill_sm90_jit_pybind.cuda.o /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_sm90_jit_pybind.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/csrc -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/TH -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/THC -isystem/usr/local/cuda/include -isystem /opt/app/python3.10/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -std=c++17 --threads 4 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -gencode=arch=compute_90a,code=sm_90a -c /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cu -o batch_prefill_sm90_jit_pybind.cuda.o In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/cutlass_utils.cuh:40, from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_config.inc:6, from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cu:16: /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory 52 | #include <curand_kernel.h> | ^~~~~~~~~~~~~~~~~ compilation terminated. In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/cutlass_utils.cuh:40, from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_config.inc:6, from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cu:16: /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory 52 | #include <curand_kernel.h> | ^~~~~~~~~~~~~~~~~ compilation terminated. In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/cutlass_utils.cuh:40, from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_config.inc:6, from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_jit_pybind.cu:16: /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory 52 | #include <curand_kernel.h> | ^~~~~~~~~~~~~~~~~ compilation terminated. fatal : Could not open input file /tmp/tmpxft_00000510_00000000-8_batch_prefill_sm90_jit_pybind.compute_90.cpp1.ii [2/9] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_ragged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/csrc -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/TH -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/app/python3.10/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -std=c++17 --threads 4 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -gencode=arch=compute_90a,code=sm_90a -c /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cu -o batch_prefill_ragged_sm90_kernel_mask_0.cuda.o FAILED: batch_prefill_ragged_sm90_kernel_mask_0.cuda.o /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_ragged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/csrc -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/TH -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/app/python3.10/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -std=c++17 --threads 4 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -gencode=arch=compute_90a,code=sm_90a -c /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cu -o batch_prefill_ragged_sm90_kernel_mask_0.cuda.o In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40, from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21, from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cu:1: /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory 52 | #include <curand_kernel.h> | ^~~~~~~~~~~~~~~~~ compilation terminated. In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40, from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21, from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cu:1: /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory 52 | #include <curand_kernel.h> | ^~~~~~~~~~~~~~~~~ compilation terminated. In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40, from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21, from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_0.cu:1: /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory 52 | #include <curand_kernel.h> | ^~~~~~~~~~~~~~~~~ compilation terminated. fatal : Could not open input file /tmp/tmpxft_0000050a_00000000-8_batch_prefill_ragged_sm90_kernel_mask_0.compute_90.cpp1.ii [3/9] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_ragged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/csrc -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/TH -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/app/python3.10/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -std=c++17 --threads 4 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -gencode=arch=compute_90a,code=sm_90a -c /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cu -o batch_prefill_ragged_sm90_kernel_mask_2.cuda.o FAILED: batch_prefill_ragged_sm90_kernel_mask_2.cuda.o /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_ragged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/csrc -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/TH -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/app/python3.10/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -std=c++17 --threads 4 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -gencode=arch=compute_90a,code=sm_90a -c /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cu -o batch_prefill_ragged_sm90_kernel_mask_2.cuda.o In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40, from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21, from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cu:1: /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory 52 | #include <curand_kernel.h> | ^~~~~~~~~~~~~~~~~ compilation terminated. In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40, from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21, from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cu:1: /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory 52 | #include <curand_kernel.h> | ^~~~~~~~~~~~~~~~~ compilation terminated. In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40, from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21, from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_2.cu:1: /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory 52 | #include <curand_kernel.h> | ^~~~~~~~~~~~~~~~~ compilation terminated. fatal : Could not open input file /tmp/tmpxft_0000050e_00000000-8_batch_prefill_ragged_sm90_kernel_mask_2.compute_90.cpp1.ii [4/9] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_paged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/csrc -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/TH -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/app/python3.10/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -std=c++17 --threads 4 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -gencode=arch=compute_90a,code=sm_90a -c /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cu -o batch_prefill_paged_sm90_kernel_mask_1.cuda.o FAILED: batch_prefill_paged_sm90_kernel_mask_1.cuda.o /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_paged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/csrc -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/TH -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/THC-isystem /usr/local/cuda/include -isystem /opt/app/python3.10/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -std=c++17 --threads 4 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -gencode=arch=compute_90a,code=sm_90a -c /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cu -o batch_prefill_paged_sm90_kernel_mask_1.cuda.o In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40, from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21, from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cu:1: /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory 52 | #include <curand_kernel.h> | ^~~~~~~~~~~~~~~~~ compilation terminated. In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40, from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21, from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cu:1: /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory 52 | #include <curand_kernel.h> | ^~~~~~~~~~~~~~~~~ compilation terminated. In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40, from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21, from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_1.cu:1: /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory 52 | #include <curand_kernel.h> | ^~~~~~~~~~~~~~~~~ compilation terminated. fatal : Could not open input file /tmp/tmpxft_0000050b_00000000-8_batch_prefill_paged_sm90_kernel_mask_1.compute_90.cpp1.ii [5/9] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_ragged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/csrc -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/TH -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/app/python3.10/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -std=c++17 --threads 4 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -gencode=arch=compute_90a,code=sm_90a -c /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cu -o batch_prefill_ragged_sm90_kernel_mask_1.cuda.o FAILED: batch_prefill_ragged_sm90_kernel_mask_1.cuda.o /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_ragged_sm90_kernel_mask_1.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/csrc -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/TH -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/app/python3.10/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -std=c++17 --threads 4 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -gencode=arch=compute_90a,code=sm_90a -c /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cu -o batch_prefill_ragged_sm90_kernel_mask_1.cuda.o In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40, from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21, from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cu:1: /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory 52 | #include <curand_kernel.h> | ^~~~~~~~~~~~~~~~~ compilation terminated. In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40, from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21, from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cu:1: /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory 52 | #include <curand_kernel.h> | ^~~~~~~~~~~~~~~~~ compilation terminated. In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40, from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21, from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_ragged_sm90_kernel_mask_1.cu:1: /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory 52 | #include <curand_kernel.h> | ^~~~~~~~~~~~~~~~~ compilation terminated. fatal : Could not open input file /tmp/tmpxft_0000050c_00000000-8_batch_prefill_ragged_sm90_kernel_mask_1.compute_90.cpp1.ii [6/9] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_paged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/csrc -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/TH -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/app/python3.10/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -std=c++17 --threads 4 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -gencode=arch=compute_90a,code=sm_90a -c /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cu -o batch_prefill_paged_sm90_kernel_mask_2.cuda.o FAILED: batch_prefill_paged_sm90_kernel_mask_2.cuda.o /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_paged_sm90_kernel_mask_2.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/csrc -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/TH -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/THC-isystem /usr/local/cuda/include -isystem /opt/app/python3.10/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -std=c++17 --threads 4 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -gencode=arch=compute_90a,code=sm_90a -c /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cu -o batch_prefill_paged_sm90_kernel_mask_2.cuda.o In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40, from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21, from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cu:1: /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory 52 | #include <curand_kernel.h> | ^~~~~~~~~~~~~~~~~ compilation terminated. In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40, from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21, from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cu:1: /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory 52 | #include <curand_kernel.h> | ^~~~~~~~~~~~~~~~~ compilation terminated. In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40, from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21, from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_2.cu:1: /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory 52 | #include <curand_kernel.h> | ^~~~~~~~~~~~~~~~~ compilation terminated. fatal : Could not open input file /tmp/tmpxft_0000050d_00000000-8_batch_prefill_paged_sm90_kernel_mask_2.compute_90.cpp1.ii [7/9] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_paged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/csrc -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/TH -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/app/python3.10/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -std=c++17 --threads 4 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -gencode=arch=compute_90a,code=sm_90a -c /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cu -o batch_prefill_paged_sm90_kernel_mask_0.cuda.o FAILED: batch_prefill_paged_sm90_kernel_mask_0.cuda.o /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_paged_sm90_kernel_mask_0.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/csrc -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/TH -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/THC-isystem /usr/local/cuda/include -isystem /opt/app/python3.10/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -std=c++17 --threads 4 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -gencode=arch=compute_90a,code=sm_90a -c /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cu -o batch_prefill_paged_sm90_kernel_mask_0.cuda.o In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40, from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21, from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cu:1: /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory 52 | #include <curand_kernel.h> | ^~~~~~~~~~~~~~~~~ compilation terminated. In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40, from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21, from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cu:1: /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory 52 | #include <curand_kernel.h> | ^~~~~~~~~~~~~~~~~ compilation terminated. In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/../../cutlass_utils.cuh:40, from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/attention/hopper/prefill_sm90.cuh:21, from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_paged_sm90_kernel_mask_0.cu:1: /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory 52 | #include <curand_kernel.h> | ^~~~~~~~~~~~~~~~~ compilation terminated. fatal : Could not open input file /tmp/tmpxft_00000509_00000000-8_batch_prefill_paged_sm90_kernel_mask_0.compute_90.cpp1.ii [8/9] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_sm90.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/csrc -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/TH -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/app/python3.10/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -std=c++17 --threads 4 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -gencode=arch=compute_90a,code=sm_90a -c /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu -o batch_prefill_sm90.cuda.o FAILED: batch_prefill_sm90.cuda.o /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output batch_prefill_sm90.cuda.o.d -DTORCH_EXTENSION_NAME=batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/csrc -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/include -I/opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/TH -isystem /opt/app/python3.10/lib/python3.10/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/app/python3.10/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 --expt-relaxed-constexpr -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 --compiler-options '-fPIC' -O3 -std=c++17 --threads 4 -use_fast_math -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -gencode=arch=compute_90a,code=sm_90a -c /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu -o batch_prefill_sm90.cuda.o In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/cutlass_utils.cuh:40, from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_config.inc:6, from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu:26: /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory 52 | #include <curand_kernel.h> | ^~~~~~~~~~~~~~~~~ compilation terminated. In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/cutlass_utils.cuh:40, from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_config.inc:6, from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu:26: /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory 52 | #include <curand_kernel.h> | ^~~~~~~~~~~~~~~~~ compilation terminated. In file included from /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/include/flashinfer/cutlass_utils.cuh:40, from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90_config.inc:6, from /root/.cache/flashinfer/90/generated/batch_prefill_with_kv_cache_dtype_q_bf16_dtype_kv_bf16_dtype_o_bf16_dtype_idx_i32_head_dim_qk_128_head_dim_vo_128_posenc_0_use_swa_False_use_logits_cap_False_f16qk_False_sm90/batch_prefill_sm90.cu:26: /opt/app/python3.10/lib/python3.10/site-packages/flashinfer/data/cutlass/tools/util/include/cutlass/util/reference/device/tensor_fill.h:52:10: fatal error: curand_kernel.h: No such file or directory 52 | #include <curand_kernel.h> | ^~~~~~~~~~~~~~~~~ compilation terminated. fatal : Could not open input file /tmp/tmpxft_0000050f_00000000-8_batch_prefill_sm90.compute_90.cpp1.ii ninja: build stopped: subcommand failed.
The text was updated successfully, but these errors were encountered:
@yzh119
Sorry, something went wrong.
No branches or pull requests
deploy QwQ-32B in 2*H20 with 0.4.3.post4 occur cannot find file error
parameters:
error log:
The text was updated successfully, but these errors were encountered: