[QUESTION] Are there targeted optimizations for the ada architecture? #56

dz1iang · 2025-03-06T02:07:11Z

Hi，I noticed that you've been running benchmarks on the L20. May I ask if there are targeted optimizations for the ada architecture?

houqi · 2025-03-06T23:26:47Z

most of them are not tuned. you can tune it yourself.

for GEMM+RS and AG+GEMM, use tools here: https://github.com/bytedance/flux/tree/main/tools

for MOE related: no tools yet. A PR is welcome.

dz1iang · 2025-03-07T06:21:18Z

most of them are not tuned. you can tune it yourself.

for GEMM+RS and AG+GEMM, use tools here: main/tools

for MOE related: no tools yet. A PR is welcome.

thx, i will try.

wenlei-bao · 2025-03-12T01:33:27Z

We recently open source the moe part, and related tuning script can be find here. You can use that for reference.

dz1iang · 2025-03-12T06:27:26Z

We recently open source the moe part, and related tuning script can be find here. You can use that for reference.

I pulled the latest code immediately. When I compiled it on ada, the following error occurred. Could you please tell me what suggestions there are for fixing it? @wenlei-bao

<bytedance::flux::GemmStreamkModeEnum::SK> >, bytedance::flux::None, cute::tuple<cute::C<128>, cute::C<128>, cute::C<64> >, cute::C<bytedance::flux::GemmKindEnum::GemmStreamK>, cute::C<3>, cute::C<bytedance::flux::GemmRasterOrderEnum::AlongM>}; bytedance::flux::OpRegistry::OpCreator = std::function<std::unique_ptr<bytedance::flux::GemmOperatorBase>()>]’
/flux/build/src/ag_gemm/registers/flux_bf16_bf16_bf16_bf16_fp32_fp32_sm89_agkernel_rrr_gemmv2_false_nil.cu:585:135:   required from here
/flux/include/flux/flux.h:1047:34: error: no match for ‘operator<’ (operand types are ‘const cute::tuple<long int, long int, long int>’ and ‘const cute::tuple<long int, long int, long int>’)
 1047 |       return bool(cute::get<I>(t) < cute::get<I>(u)) ||
      |               ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
/flux/3rdparty/cutlass/include/cute/numeric/integral_constant.hpp:237:1: note: candidate: ‘template<auto t, auto u> constexpr cute::C<(t < u)> cute::operator<(cute::C<v>, cute::C<v>)’
  237 | CUTE_BINARY_OP( <);
      | ^~~~~~~~
/flux/3rdparty/cutlass/include/cute/numeric/integral_constant.hpp:237:1: note:   template argument deduction/substitution failed:
/flux/include/flux/flux.h:1047:34: note:   ‘cute::tuple<long int, long int, long int>’ is not derived from ‘cute::C<v>’
 1047 |       return bool(cute::get<I>(t) < cute::get<I>(u)) ||
      |               ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
/flux/include/flux/flux.h:1047:88: error: no match for ‘operator<’ (operand types are ‘const cute::tuple<long int, long int, long int>’ and ‘const cute::tuple<long int, long int, long int>’)
 1047 |       return bool(cute::get<I>(t) < cute::get<I>(u)) ||
      |                                                                                        ^
/flux/3rdparty/cutlass/include/cute/numeric/integral_constant.hpp:237:1: note: candidate: ‘template<auto t, auto u> constexpr cute::C<(t < u)> cute::operator<(cute::C<v>, cute::C<v>)’
  237 | CUTE_BINARY_OP( <);
      | ^~~~~~~~
/flux/3rdparty/cutlass/include/cute/numeric/integral_constant.hpp:237:1: note:   template argument deduction/substitution failed:
/flux/include/flux/flux.h:1047:88: note:   ‘cute::tuple<long int, long int, long int>’ is not derived from ‘cute::C<v>’
 1047 |       return bool(cute::get<I>(t) < cute::get<I>(u)) ||
      |                                                                                        ^
make[2]: *** [src/ag_gemm/CMakeFiles/flux_cuda_all_gather.dir/build.make:92: src/ag_gemm/CMakeFiles/flux_cuda_all_gather.dir/registers/flux_bf16_bf16_bf16_bf16_fp32_fp32_sm89_agkernel_rrr_gemmv2_false_nil.cu.o] Error 1
make[2]: Leaving directory '/flux/build'
make[1]: *** [CMakeFiles/Makefile2:565: src/ag_gemm/CMakeFiles/flux_cuda_all_gather.dir/all] Error 2
make[1]: Leaving directory '/flux/build'
make: *** [Makefile:136: all] Error 2
+ merge_compile_commands
+ cd /flux
+ command -v ninja
++ ls './build/temp.*/build.ninja'
ls: cannot access './build/temp.*/build.ninja': No such file or directory
+ ninja -f -t compdb
ninja: error: loading '-t': No such file or directory

houqi · 2025-03-12T07:56:58Z

We recently open source the moe part, and related tuning script can be find here. You can use that for reference.

I pulled the latest code immediately. When I compiled it on ada, the following error occurred. Could you please tell me what suggestions there are for fixing it? @wenlei-bao

<bytedance::flux::GemmStreamkModeEnum::SK> >, bytedance::flux::None, cute::tuple<cute::C<128>, cute::C<128>, cute::C<64> >, cute::C<bytedance::flux::GemmKindEnum::GemmStreamK>, cute::C<3>, cute::C<bytedance::flux::GemmRasterOrderEnum::AlongM>}; bytedance::flux::OpRegistry::OpCreator = std::function<std::unique_ptr<bytedance::flux::GemmOperatorBase>()>]’
/flux/build/src/ag_gemm/registers/flux_bf16_bf16_bf16_bf16_fp32_fp32_sm89_agkernel_rrr_gemmv2_false_nil.cu:585:135:   required from here
/flux/include/flux/flux.h:1047:34: error: no match for ‘operator<’ (operand types are ‘const cute::tuple<long int, long int, long int>’ and ‘const cute::tuple<long int, long int, long int>’)
 1047 |       return bool(cute::get<I>(t) < cute::get<I>(u)) ||
      |               ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
/flux/3rdparty/cutlass/include/cute/numeric/integral_constant.hpp:237:1: note: candidate: ‘template<auto t, auto u> constexpr cute::C<(t < u)> cute::operator<(cute::C<v>, cute::C<v>)’
  237 | CUTE_BINARY_OP( <);
      | ^~~~~~~~
/flux/3rdparty/cutlass/include/cute/numeric/integral_constant.hpp:237:1: note:   template argument deduction/substitution failed:
/flux/include/flux/flux.h:1047:34: note:   ‘cute::tuple<long int, long int, long int>’ is not derived from ‘cute::C<v>’
 1047 |       return bool(cute::get<I>(t) < cute::get<I>(u)) ||
      |               ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
/flux/include/flux/flux.h:1047:88: error: no match for ‘operator<’ (operand types are ‘const cute::tuple<long int, long int, long int>’ and ‘const cute::tuple<long int, long int, long int>’)
 1047 |       return bool(cute::get<I>(t) < cute::get<I>(u)) ||
      |                                                                                        ^
/flux/3rdparty/cutlass/include/cute/numeric/integral_constant.hpp:237:1: note: candidate: ‘template<auto t, auto u> constexpr cute::C<(t < u)> cute::operator<(cute::C<v>, cute::C<v>)’
  237 | CUTE_BINARY_OP( <);
      | ^~~~~~~~
/flux/3rdparty/cutlass/include/cute/numeric/integral_constant.hpp:237:1: note:   template argument deduction/substitution failed:
/flux/include/flux/flux.h:1047:88: note:   ‘cute::tuple<long int, long int, long int>’ is not derived from ‘cute::C<v>’
 1047 |       return bool(cute::get<I>(t) < cute::get<I>(u)) ||
      |                                                                                        ^
make[2]: *** [src/ag_gemm/CMakeFiles/flux_cuda_all_gather.dir/build.make:92: src/ag_gemm/CMakeFiles/flux_cuda_all_gather.dir/registers/flux_bf16_bf16_bf16_bf16_fp32_fp32_sm89_agkernel_rrr_gemmv2_false_nil.cu.o] Error 1
make[2]: Leaving directory '/flux/build'
make[1]: *** [CMakeFiles/Makefile2:565: src/ag_gemm/CMakeFiles/flux_cuda_all_gather.dir/all] Error 2
make[1]: Leaving directory '/flux/build'
make: *** [Makefile:136: all] Error 2
+ merge_compile_commands
+ cd /flux
+ command -v ninja
++ ls './build/temp.*/build.ninja'
ls: cannot access './build/temp.*/build.ninja': No such file or directory
+ ninja -f -t compdb
ninja: error: loading '-t': No such file or directory

try clean your workspace then follow the README.md and try recompiles it.

NOTE that you have to run this before build.sh https://github.com/bytedance/flux/blob/main/install_deps.sh

also make sure you compile with the right image. we use NVCC 12.4 + gcc 12.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] Are there targeted optimizations for the ada architecture? #56

[QUESTION] Are there targeted optimizations for the ada architecture? #56

dz1iang commented Mar 6, 2025

houqi commented Mar 6, 2025

dz1iang commented Mar 7, 2025

wenlei-bao commented Mar 12, 2025 •

edited

Loading

dz1iang commented Mar 12, 2025 •

edited

Loading

houqi commented Mar 12, 2025

[QUESTION] Are there targeted optimizations for the ada architecture? #56

[QUESTION] Are there targeted optimizations for the ada architecture? #56

Comments

dz1iang commented Mar 6, 2025

houqi commented Mar 6, 2025

dz1iang commented Mar 7, 2025

wenlei-bao commented Mar 12, 2025 • edited Loading

dz1iang commented Mar 12, 2025 • edited Loading

houqi commented Mar 12, 2025

wenlei-bao commented Mar 12, 2025 •

edited

Loading

dz1iang commented Mar 12, 2025 •

edited

Loading