Skip to content

Commit

Permalink
[ROCM] Fix feature flags for gfx1100 and improve flag handling (iree-…
Browse files Browse the repository at this point in the history
…org#18781)

For gfx1100, the feature string being passed was
`+wavefrontsize32-fma-mix-insts`. This is clearly invalid as the flags
aren't seperated. This patch improves the feature handling for multiple
flags.

This fix also found a bug with passing -fma-mix-insts on gfx1100, which
causes a crash in the compiler:

```
iree-compile: /home/kunwar/Work/iree/third_party/llvm-project/llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp:118: HardClauseType (anonymous namespace)::SIInsertHardClauses::getHardClauseType(const MachineInstr &): Assertion `ST->getGeneration() >= AMDGPUSubtarget::GFX11' failed.
Please report issues to https://github.com/iree-org/iree/issues and include the crash backtrace.
Stack dump:
0.    Running pass 'CallGraph Pass Manager' on module '_winograd_input_nchw_dispatch_0'.
1.    Running pass 'SI Insert Hard Clauses' on function '@_winograd_input_nchw_dispatch_0_winograd_input_transform_8x8x1x1x1x1xf32_dispatch_tensor_store'
iree-compile: /home/kunwar/Work/iree/third_party/llvm-project/llvm/lib/Target/AMDGPU/SIInsertHardClauses.cpp:118: HardClauseType (anonymous namespace)::SIInsertHardClauses::getHardClauseType(const MachineInstr &): Assertion `ST->getGeneration() >= AMDGPUSubtarget::GFX11' failed.
Aborted (core dumped)
```

To fix this, the patch also restricts passing -fma-mix-insts to gfx9
  • Loading branch information
Groverkss authored Oct 15, 2024
1 parent afe18d2 commit a3d8ad6
Showing 1 changed file with 14 additions and 8 deletions.
22 changes: 14 additions & 8 deletions compiler/plugins/target/ROCM/ROCMTarget.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -433,30 +433,36 @@ class ROCMTargetBackend final : public TargetBackend {
opt.UnsafeFPMath = false;
opt.NoInfsFPMath = false;
opt.NoNaNsFPMath = true;
std::string features;
SmallVector<std::string> features;
if (targetArch.starts_with("gfx10") ||
targetArch.starts_with("gfx11")) {
switch (subgroupSize.value_or(64)) {
case 32:
features = "+wavefrontsize32";
features.emplace_back("+wavefrontsize32");
break;
default:
case 64:
features = "+wavefrontsize64";
features.emplace_back("+wavefrontsize64");
break;
}
}
if (!targetFeatures.empty()) {
features += (features.empty() ? "" : ",") + targetFeatures.str();
}

// Mixed precision fma instructions have complicated semantics on
// gf9+ GPUs and can lead to numeric issues as seen in
// https://github.com/iree-org/iree/issues/18746 so we disable this
// feature.
features += "-fma-mix-insts";
if (targetArch.starts_with("gfx9")) {
features.emplace_back("-fma-mix-insts");
}

if (!targetFeatures.empty()) {
features.emplace_back(targetFeatures.str());
}

std::string featureStr = llvm::join(features, ",");

targetMachine.reset(target->createTargetMachine(
triple.str(), targetArch, features, opt, llvm::Reloc::Model::PIC_,
triple.str(), targetArch, featureStr, opt, llvm::Reloc::Model::PIC_,
std::nullopt, llvm::CodeGenOptLevel::Aggressive));

if (!targetMachine) {
Expand Down

0 comments on commit a3d8ad6

Please sign in to comment.