-
Notifications
You must be signed in to change notification settings - Fork 294
Insights: pytorch/ao
Overview
Could not load contribution data
Please try again later
26 Pull requests merged by 12 people
-
Update README.md to include Flux-Fast
#2457 merged
Jun 28, 2025 -
Graduate debug handle in torchao
#2452 merged
Jun 27, 2025 -
[float8 moe training] fix bug affecting mixed precision training
#2451 merged
Jun 27, 2025 -
Add exportable coreml codebook quantization op
#2443 merged
Jun 27, 2025 -
Store NVFP4 block scales in swwizzled layout on tensor
#2438 merged
Jun 26, 2025 -
Fixes issue #156414: Fixes bug in implementation of _combine_histogram (Follow up)
#2418 merged
Jun 26, 2025 -
Improve tiling params to speed up prefill
#2406 merged
Jun 26, 2025 -
float8 readme: add key features section
#2448 merged
Jun 26, 2025 -
float8 readme: remove duplication
#2447 merged
Jun 26, 2025 -
[CPU] Fix ref path of DA8W4 cpp kernel
#2444 merged
Jun 26, 2025 -
Call out axolotl + QAT integration on README
#2442 merged
Jun 25, 2025 -
[CPU] Enable DA8W4 on CPU
#2128 merged
Jun 25, 2025 -
[float8] Prevent quantize_affine_float8/dequantize_affine_float8 decomposed on inductor
#2379 merged
Jun 25, 2025 -
solve the test issue
#2432 merged
Jun 24, 2025 -
add-to-benchmarks
#2427 merged
Jun 24, 2025 -
Gemlite generate.py fix
#2372 merged
Jun 24, 2025 -
enable tensor parallelism for MXLinear
#2434 merged
Jun 24, 2025 -
NVfp4
#2408 merged
Jun 24, 2025 -
enable to_mxfp8 cast for DTensor
#2420 merged
Jun 24, 2025 -
Update github links in torchao pt2e tutorial
#2435 merged
Jun 24, 2025 -
[float8] add _auto_filter_for_recipe to float8
#2410 merged
Jun 24, 2025 -
make dtensor shared test util more generic
#2416 merged
Jun 24, 2025 -
rename
torchao.testing.float8
totorchao.testing.training
#2415 merged
Jun 24, 2025 -
fix float8 training TP+SP integration tests
#2414 merged
Jun 24, 2025 -
Revert "Build mxfp4 kernel for sm120a"
#2428 merged
Jun 24, 2025 -
mitigate the numeric test issue
#2426 merged
Jun 23, 2025
14 Pull requests opened by 10 people
-
Add support for Int4GroupwisePreshuffleTensor for fbgemm
#2421 opened
Jun 22, 2025 -
Remove `transpose_input` from fbgemm configs
#2422 opened
Jun 22, 2025 -
enabling xpu in UT test
#2424 opened
Jun 23, 2025 -
[float8 moe training] Add TP support
#2425 opened
Jun 23, 2025 -
[DOC] Set strict export explicitly for API change
#2430 opened
Jun 24, 2025 -
test rowwise fp32
#2431 opened
Jun 24, 2025 -
mxfp8 training: add TP sharding strategy for dim1 kernel
#2436 opened
Jun 24, 2025 -
Add support for Float8ActivationInt4GroupwisePreshuffleTensor for fbgemm
#2437 opened
Jun 24, 2025 -
Add kernel
#2439 opened
Jun 25, 2025 -
[DRAFT] Enable CPU data layout convert to XPU
#2441 opened
Jun 25, 2025 -
fix scale shape in mx_format conversion
#2446 opened
Jun 26, 2025 -
[NF4] Support nf4 tensor shard and gather
#2449 opened
Jun 26, 2025 -
[float8] add tests for float8 _auto_filter_for_recipe
#2450 opened
Jun 26, 2025 -
[WIP] [moe training] ScaledGroupedMMTensor - set dtype
#2455 opened
Jun 27, 2025
3 Issues closed by 3 people
-
Benefits of Using QAT Before GGUF Quantization?
#2419 closed
Jun 25, 2025 -
[Quant] Can quant not be decomposed on inductor?
#2228 closed
Jun 25, 2025 -
Question about the choice of use_fast_accum in FP8 Training
#2377 closed
Jun 22, 2025
7 Issues opened by 7 people
-
How to not decompose the choose_qparams_affine call_func
#2456 opened
Jun 27, 2025 -
[MoE training] crash with FSDP if shared_expert uses float8 in torchtitan llama4
#2453 opened
Jun 27, 2025 -
Gradient Checkpoint makes FP8 Training Slow
#2445 opened
Jun 26, 2025 -
[BUG]: Gemlite integration not working
#2429 opened
Jun 24, 2025 -
Training hangs after 1 epoch when using QAT
#2423 opened
Jun 22, 2025
17 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Inference tutorial - Part 3 of e2e series [WIP]
#2343 commented on
Jun 26, 2025 • 14 new comments -
[Inductor] Support scaled mm on inductor
#2411 commented on
Jun 25, 2025 • 4 new comments -
Add benchmark numbers to dashboard
#2260 commented on
Jun 28, 2025 • 2 new comments -
Align scale dtype with model precision in GPTQ
#2403 commented on
Jun 26, 2025 • 1 new comment -
[WIP] Make AWQ more general
#2400 commented on
Jun 25, 2025 • 0 new comments -
Enables the per_tensor lowering patterns for weight per_packing
#2391 commented on
Jun 26, 2025 • 0 new comments -
moe quant with dedicated kernels [wip]
#2325 commented on
Jun 26, 2025 • 0 new comments -
Eval hf models using lm_eval
#2179 commented on
Jun 23, 2025 • 0 new comments -
[low-bit optim] Add coat for float8 optimizer
#1231 commented on
Jun 26, 2025 • 0 new comments -
[roadmap/tracker] Low precision MoE training
#2147 commented on
Jun 28, 2025 • 0 new comments -
Support `torch.int4` `target_dtype` for ops `choose_qparams_affine`, `quantize_affine`, `dequantize_affine`
#2354 commented on
Jun 27, 2025 • 0 new comments -
MX single node performance tracker
#1768 commented on
Jun 26, 2025 • 0 new comments -
low precision training upcoming feature tracker
#556 commented on
Jun 26, 2025 • 0 new comments -
int4_weight_only get plain weight are padded
#2249 commented on
Jun 24, 2025 • 0 new comments -
TP + FSDP + MXFP8 fails during compile
#2393 commented on
Jun 24, 2025 • 0 new comments -
BF16 stochastic rounding does not work distributed (FSDP)
#2296 commented on
Jun 23, 2025 • 0 new comments -
TorchAO Paper
#2412 commented on
Jun 23, 2025 • 0 new comments