Tags: pytorch/ao
Tags
Update ROCm MFMA instruction syntax in sparse Marlin MMA implementation Modify the MFMA instruction assembly for AMD GPUs to use correct syntax and operand handling. Replace register constraints with vector register constraints and simplify the instruction format to improve compatibility and readability on ROCm platforms.
Revert "Remove setup changes" This reverts commit fbe7ac2.
update test-infra to release version (#1391) * update test-infra to release version Summary: pytorch/test-infra#6016 landed recently which is breaking our ROCm builds We point to a special branch of test-infra created just before this PR to unblock the v0.7.0 release. Test Plan: CI Reviewers: Subscribers: Tasks: Tags: * Update .github/workflows/build_wheels_linux.yml --------- Co-authored-by: Andrey Talman <[email protected]>
Add TTFT benchmarks + update sparsity benchmarks (#1140) This PR adds in TTFT token benchmarks to torchAO, and also updates the benchmarking script to handle sparsity a bit nicer + use the 2:4 sparse checkpoints that are available. Additionally also adds in padding support for int8 dynamic quant + 2:4 sparsity, which we were missing before.
PreviousNext