Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Are there targeted optimizations for the ada architecture? #56

Open
dz1iang opened this issue Mar 6, 2025 · 5 comments
Open

Comments

@dz1iang
Copy link

dz1iang commented Mar 6, 2025

Hi,I noticed that you've been running benchmarks on the L20. May I ask if there are targeted optimizations for the ada architecture?

@houqi
Copy link
Collaborator

houqi commented Mar 6, 2025

most of them are not tuned. you can tune it yourself.

for GEMM+RS and AG+GEMM, use tools here: https://github.com/bytedance/flux/tree/main/tools

for MOE related: no tools yet. A PR is welcome.

@dz1iang
Copy link
Author

dz1iang commented Mar 7, 2025

most of them are not tuned. you can tune it yourself.

for GEMM+RS and AG+GEMM, use tools here: main/tools

for MOE related: no tools yet. A PR is welcome.

thx, i will try.

@wenlei-bao
Copy link
Collaborator

wenlei-bao commented Mar 12, 2025

We recently open source the moe part, and related tuning script can be find here. You can use that for reference.

@dz1iang
Copy link
Author

dz1iang commented Mar 12, 2025

We recently open source the moe part, and related tuning script can be find here. You can use that for reference.

I pulled the latest code immediately. When I compiled it on ada, the following error occurred. Could you please tell me what suggestions there are for fixing it? @wenlei-bao

<bytedance::flux::GemmStreamkModeEnum::SK> >, bytedance::flux::None, cute::tuple<cute::C<128>, cute::C<128>, cute::C<64> >, cute::C<bytedance::flux::GemmKindEnum::GemmStreamK>, cute::C<3>, cute::C<bytedance::flux::GemmRasterOrderEnum::AlongM>}; bytedance::flux::OpRegistry::OpCreator = std::function<std::unique_ptr<bytedance::flux::GemmOperatorBase>()>]’
/flux/build/src/ag_gemm/registers/flux_bf16_bf16_bf16_bf16_fp32_fp32_sm89_agkernel_rrr_gemmv2_false_nil.cu:585:135:   required from here
/flux/include/flux/flux.h:1047:34: error: no match for ‘operator<’ (operand types are ‘const cute::tuple<long int, long int, long int>’ and ‘const cute::tuple<long int, long int, long int>’)
 1047 |       return bool(cute::get<I>(t) < cute::get<I>(u)) ||
      |               ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
/flux/3rdparty/cutlass/include/cute/numeric/integral_constant.hpp:237:1: note: candidate: ‘template<auto t, auto u> constexpr cute::C<(t < u)> cute::operator<(cute::C<v>, cute::C<v>)’
  237 | CUTE_BINARY_OP( <);
      | ^~~~~~~~
/flux/3rdparty/cutlass/include/cute/numeric/integral_constant.hpp:237:1: note:   template argument deduction/substitution failed:
/flux/include/flux/flux.h:1047:34: note:   ‘cute::tuple<long int, long int, long int>’ is not derived from ‘cute::C<v>’
 1047 |       return bool(cute::get<I>(t) < cute::get<I>(u)) ||
      |               ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
/flux/include/flux/flux.h:1047:88: error: no match for ‘operator<’ (operand types are ‘const cute::tuple<long int, long int, long int>’ and ‘const cute::tuple<long int, long int, long int>’)
 1047 |       return bool(cute::get<I>(t) < cute::get<I>(u)) ||
      |                                                                                        ^
/flux/3rdparty/cutlass/include/cute/numeric/integral_constant.hpp:237:1: note: candidate: ‘template<auto t, auto u> constexpr cute::C<(t < u)> cute::operator<(cute::C<v>, cute::C<v>)’
  237 | CUTE_BINARY_OP( <);
      | ^~~~~~~~
/flux/3rdparty/cutlass/include/cute/numeric/integral_constant.hpp:237:1: note:   template argument deduction/substitution failed:
/flux/include/flux/flux.h:1047:88: note:   ‘cute::tuple<long int, long int, long int>’ is not derived from ‘cute::C<v>’
 1047 |       return bool(cute::get<I>(t) < cute::get<I>(u)) ||
      |                                                                                        ^
make[2]: *** [src/ag_gemm/CMakeFiles/flux_cuda_all_gather.dir/build.make:92: src/ag_gemm/CMakeFiles/flux_cuda_all_gather.dir/registers/flux_bf16_bf16_bf16_bf16_fp32_fp32_sm89_agkernel_rrr_gemmv2_false_nil.cu.o] Error 1
make[2]: Leaving directory '/flux/build'
make[1]: *** [CMakeFiles/Makefile2:565: src/ag_gemm/CMakeFiles/flux_cuda_all_gather.dir/all] Error 2
make[1]: Leaving directory '/flux/build'
make: *** [Makefile:136: all] Error 2
+ merge_compile_commands
+ cd /flux
+ command -v ninja
++ ls './build/temp.*/build.ninja'
ls: cannot access './build/temp.*/build.ninja': No such file or directory
+ ninja -f -t compdb
ninja: error: loading '-t': No such file or directory

@houqi
Copy link
Collaborator

houqi commented Mar 12, 2025

We recently open source the moe part, and related tuning script can be find here. You can use that for reference.

I pulled the latest code immediately. When I compiled it on ada, the following error occurred. Could you please tell me what suggestions there are for fixing it? @wenlei-bao

<bytedance::flux::GemmStreamkModeEnum::SK> >, bytedance::flux::None, cute::tuple<cute::C<128>, cute::C<128>, cute::C<64> >, cute::C<bytedance::flux::GemmKindEnum::GemmStreamK>, cute::C<3>, cute::C<bytedance::flux::GemmRasterOrderEnum::AlongM>}; bytedance::flux::OpRegistry::OpCreator = std::function<std::unique_ptr<bytedance::flux::GemmOperatorBase>()>]’
/flux/build/src/ag_gemm/registers/flux_bf16_bf16_bf16_bf16_fp32_fp32_sm89_agkernel_rrr_gemmv2_false_nil.cu:585:135:   required from here
/flux/include/flux/flux.h:1047:34: error: no match for ‘operator<’ (operand types are ‘const cute::tuple<long int, long int, long int>’ and ‘const cute::tuple<long int, long int, long int>’)
 1047 |       return bool(cute::get<I>(t) < cute::get<I>(u)) ||
      |               ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
/flux/3rdparty/cutlass/include/cute/numeric/integral_constant.hpp:237:1: note: candidate: ‘template<auto t, auto u> constexpr cute::C<(t < u)> cute::operator<(cute::C<v>, cute::C<v>)’
  237 | CUTE_BINARY_OP( <);
      | ^~~~~~~~
/flux/3rdparty/cutlass/include/cute/numeric/integral_constant.hpp:237:1: note:   template argument deduction/substitution failed:
/flux/include/flux/flux.h:1047:34: note:   ‘cute::tuple<long int, long int, long int>’ is not derived from ‘cute::C<v>’
 1047 |       return bool(cute::get<I>(t) < cute::get<I>(u)) ||
      |               ~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
/flux/include/flux/flux.h:1047:88: error: no match for ‘operator<’ (operand types are ‘const cute::tuple<long int, long int, long int>’ and ‘const cute::tuple<long int, long int, long int>’)
 1047 |       return bool(cute::get<I>(t) < cute::get<I>(u)) ||
      |                                                                                        ^
/flux/3rdparty/cutlass/include/cute/numeric/integral_constant.hpp:237:1: note: candidate: ‘template<auto t, auto u> constexpr cute::C<(t < u)> cute::operator<(cute::C<v>, cute::C<v>)’
  237 | CUTE_BINARY_OP( <);
      | ^~~~~~~~
/flux/3rdparty/cutlass/include/cute/numeric/integral_constant.hpp:237:1: note:   template argument deduction/substitution failed:
/flux/include/flux/flux.h:1047:88: note:   ‘cute::tuple<long int, long int, long int>’ is not derived from ‘cute::C<v>’
 1047 |       return bool(cute::get<I>(t) < cute::get<I>(u)) ||
      |                                                                                        ^
make[2]: *** [src/ag_gemm/CMakeFiles/flux_cuda_all_gather.dir/build.make:92: src/ag_gemm/CMakeFiles/flux_cuda_all_gather.dir/registers/flux_bf16_bf16_bf16_bf16_fp32_fp32_sm89_agkernel_rrr_gemmv2_false_nil.cu.o] Error 1
make[2]: Leaving directory '/flux/build'
make[1]: *** [CMakeFiles/Makefile2:565: src/ag_gemm/CMakeFiles/flux_cuda_all_gather.dir/all] Error 2
make[1]: Leaving directory '/flux/build'
make: *** [Makefile:136: all] Error 2
+ merge_compile_commands
+ cd /flux
+ command -v ninja
++ ls './build/temp.*/build.ninja'
ls: cannot access './build/temp.*/build.ninja': No such file or directory
+ ninja -f -t compdb
ninja: error: loading '-t': No such file or directory

try clean your workspace then follow the README.md and try recompiles it.

NOTE that you have to run this before build.sh https://github.com/bytedance/flux/blob/main/install_deps.sh

also make sure you compile with the right image. we use NVCC 12.4 + gcc 12.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants