-
Notifications
You must be signed in to change notification settings - Fork 321
Insights: pytorch/ao
Overview
Could not load contribution data
Please try again later
48 Pull requests merged by 12 people
-
[mxfp8 moe training] add per group blocked scale kernels
#2886 merged
Aug 28, 2025 -
Rename
to_float8
tofrom_hp
#2893 merged
Aug 28, 2025 -
[mxfp8 moe training] add grouped gemm benchmark script
#2882 merged
Aug 28, 2025 -
[mxfp8 moe training] refactor all var names with suffix _mx to _data for clarity
#2879 merged
Aug 28, 2025 -
[fp8 blockwise] wrap triton quantization kernels in custom ops for torch.compile compatibility
#2829 merged
Aug 28, 2025 -
exclude libcudart.so.13 from auditwheel repair to fix CUDA 13.0 wheel build
#2892 merged
Aug 28, 2025 -
[CPU][float8] Add scaled_embedding_bag kernel
#2686 merged
Aug 28, 2025 -
Support QAT int4 v1 path for BC
#2888 merged
Aug 28, 2025 -
[fp8 blockwise] load 2d chunks for groupwise quant to enable coalesced gmem accesses
#2827 merged
Aug 27, 2025 -
use shared bench + profile utils in blockwise fwd bwd bench script
#2826 merged
Aug 27, 2025 -
integrate torch._scaled_mm into Float8BlockwiseLinear and add bench script
#2785 merged
Aug 27, 2025 -
[moe fp8 training] fused reduction kernel along dim1 for 3d expert weights in backward
#2865 merged
Aug 27, 2025 -
Update AWQ implementation to not use extra wrapper tensor subclass
#2753 merged
Aug 27, 2025 -
[mxfp8 moe] add support for fbgemm 2d-3d mx8mx8bf16 grouped gemm
#2848 merged
Aug 27, 2025 -
Introduce IntxOpaqueTensor to replace PackedInt8DynamicActivationIntxWeightLayout in AQT
#2742 merged
Aug 27, 2025 -
[moe fp8 training] use transpose method when quantizing to avoid uncoalesced gmem accesses
#2864 merged
Aug 27, 2025 -
[moe fp8 training] test and bench new faster method for per group rowwise scaling
#2863 merged
Aug 27, 2025 -
[CPU] Introduce Int4OpaqueTensor to replace Int4CPULayout in AQT
#2798 merged
Aug 27, 2025 -
Enable quantizing local checkpoints in model release script
#2859 merged
Aug 26, 2025 -
Conditional ROCm kernel build
#2839 merged
Aug 26, 2025 -
fix ci import error
#2876 merged
Aug 26, 2025 -
TorchAOBaseTensor
__tensor_flatten__
and__tensor_unflatten__
use…#2874 merged
Aug 26, 2025 -
Update IntxUnpackedTensor to support dynamic activation
#2861 merged
Aug 26, 2025 -
release notes script: keep not user facing rows
#2875 merged
Aug 26, 2025 -
Fix UT assertion error for int8 sdpa fusion
#2816 merged
Aug 26, 2025 -
Add OPAQUE packing format
#2878 merged
Aug 26, 2025 -
add mxfp8 to test_tp
#2870 merged
Aug 25, 2025 -
Fix test tolerance
#2871 merged
Aug 25, 2025 -
[mxfp8 moe training] Add mxfp8 to FSDP tests
#2849 merged
Aug 25, 2025 -
bump version to 0.14.0
#2872 merged
Aug 25, 2025 -
Add NVFP4 QAT
#2666 merged
Aug 25, 2025 -
[reland] Refactor TorchAOBaseTensor for better BC (#2793)
#2855 merged
Aug 23, 2025 -
Add test for lut based embedding quantization.
#2825 merged
Aug 23, 2025 -
Add lut quantized embedding.
#2824 merged
Aug 23, 2025 -
Fix autoquant after version util changes
#2858 merged
Aug 22, 2025 -
Fix test_nvfp4_tensor.py merge conflict
#2857 merged
Aug 22, 2025 -
Fix NVFP4 to_copy
#2812 merged
Aug 22, 2025 -
Remove TORCH_VERSION_AT_LEAST* warnings when importing torch
#2852 merged
Aug 22, 2025 -
Fix float8 + int4 QAT
#2851 merged
Aug 22, 2025 -
Revert "Refactor TorchAOBaseTensor for better BC support"
#2854 merged
Aug 22, 2025 -
fix incorrect torch version test
#2786 merged
Aug 22, 2025 -
Refactor TorchAOBaseTensor for better BC support
#2793 merged
Aug 22, 2025 -
Revert "Add the ops for groupwise lut quantization for embeding"
#2850 merged
Aug 22, 2025 -
float8 kernel test: make more robust
#2847 merged
Aug 22, 2025 -
[mxfp8 moe] replace per group scaling with conventional scaling
#2841 merged
Aug 22, 2025 -
[mxfp8 moe] add compile test; add mxfp8 to bench script
#2835 merged
Aug 22, 2025 -
mx: delete
use_fp4_custom_triton_dequant_kernel
option#2831 merged
Aug 22, 2025 -
mx: delete
triton_f4_to_bf16
kernel#2830 merged
Aug 22, 2025
17 Pull requests opened by 10 people
-
Add Int4XPUTensorIntZP
#2845 opened
Aug 22, 2025 -
Fix Llama4 example
#2846 opened
Aug 22, 2025 -
use gcnArchName to get gpu_arch
#2853 opened
Aug 22, 2025 -
[moe training] add test case for shared expert in distributed tests
#2856 opened
Aug 22, 2025 -
Port metadata from the linear node onto the reference custom op for int4
#2860 opened
Aug 22, 2025 -
Split implements and implements_torch_function
#2866 opened
Aug 24, 2025 -
Move CPU kernels out of experimental
#2868 opened
Aug 25, 2025 -
safetensors support
#2881 opened
Aug 26, 2025 -
Fix Float8Tensor quantize op kernrel preference dispatch
#2883 opened
Aug 26, 2025 -
Float8Tensor per row quantization pass bias to fbgemm kernel
#2884 opened
Aug 26, 2025 -
[not for land] guard against fbgemm_gpu core dump
#2887 opened
Aug 27, 2025 -
[mxfp8 moe training] add triton kernel for blocked swizzled 3d weight scales
#2894 opened
Aug 28, 2025 -
Add tracking for new tensors, AQT and layouts
#2895 opened
Aug 28, 2025 -
[mxfp8 moe training] use dim1 cast cuda kernel in bwd
#2897 opened
Aug 28, 2025 -
hf integration doc page
#2899 opened
Aug 28, 2025 -
[mxfp8 moe training] integrate triton kernels for converting scales to blocked format
#2902 opened
Aug 28, 2025
5 Issues closed by 2 people
-
Using triton_op + wrap_triton introduces kernel performance regression
#2898 closed
Aug 28, 2025 -
fbgemm mxfp8 grouped gemm issues
#2877 closed
Aug 26, 2025 -
NVFP4Tensor to_copy is wrong?
#2811 closed
Aug 22, 2025 -
`torch_version_at_least` semantics are incorrect
#2722 closed
Aug 22, 2025
12 Issues opened by 8 people
-
float8 rowwise scaled grouped mm doesn't support B200
#2904 opened
Aug 28, 2025 -
Loading fp8-int4 model got unexpected keyword argument 'requires_grad'
#2903 opened
Aug 28, 2025 -
Aborted (core dumped) when importing v0.13.0 RC
#2901 opened
Aug 28, 2025 -
[CPU][FP8][Inductor] How to support fp8 quant for inductor on CPU
#2896 opened
Aug 28, 2025 -
CI ROCM tests failing with "HW exception - GPU hang"
#2890 opened
Aug 27, 2025 -
Add build docs instructions to contributing guide
#2889 opened
Aug 27, 2025 -
Meta kernel for an AO op is incorrect
#2885 opened
Aug 26, 2025 -
[MoE fp8 rowwise training] Runtime of quantizing 3d expert weights scales worse than linearly
#2880 opened
Aug 26, 2025 -
Add HuggingFace integration doc page
#2873 opened
Aug 25, 2025 -
CUDA OOM When Running AWQ int4 Quantized llama3.1-8b at Batch Size 1
#2867 opened
Aug 25, 2025 -
Duplicated tests in test_mx_tensor.py and test_nvfp4_tensor.py?
#2862 opened
Aug 23, 2025
25 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
refactor common used toy model
#2729 commented on
Aug 28, 2025 • 24 new comments -
Make SmoothQuant more General
#2728 commented on
Aug 28, 2025 • 23 new comments -
[CPU][FP8] Support FP8 SDPA for CPU backend
#2689 commented on
Aug 28, 2025 • 19 new comments -
Add Int4TilePackedTo4dTensor
#2791 commented on
Aug 28, 2025 • 9 new comments -
Molly/enable xpu ci
#2814 commented on
Aug 22, 2025 • 6 new comments -
Xpu ut/quantization
#2796 commented on
Aug 22, 2025 • 1 new comment -
test fsdp2 Moe
#2842 commented on
Aug 22, 2025 • 0 new comments -
[test only] testing adding optioanl tensor arg to float8 tensor
#2840 commented on
Aug 22, 2025 • 0 new comments -
Add MyPy support
#2838 commented on
Aug 26, 2025 • 0 new comments -
[wip] mx: expose a fast path for casting to fp4x2
#2832 commented on
Aug 22, 2025 • 0 new comments -
Add test function for the group wise lut quantization
#2703 commented on
Aug 23, 2025 • 0 new comments -
Replace `torch.norm` with `torch.linalg.vector_norm`
#2660 commented on
Aug 25, 2025 • 0 new comments -
Update test_bitpacking.cpp. Bug fix in test.
#2633 commented on
Aug 23, 2025 • 0 new comments -
Refactor Wanda for better readability
#2538 commented on
Aug 23, 2025 • 0 new comments -
config change to enable pre compute scale for fp8
#2536 commented on
Aug 23, 2025 • 0 new comments -
[CPU] Add support for dynamic float8 act float8 weight on CPU
#2505 commented on
Aug 28, 2025 • 0 new comments -
Groupwise low bit LUT based model quantization.
#2407 commented on
Aug 23, 2025 • 0 new comments -
add recipe config in aps for fp8
#2322 commented on
Aug 23, 2025 • 0 new comments -
Benchmark AWQ and SmoothQuant within vLLM ecosystem
#2815 commented on
Aug 28, 2025 • 0 new comments -
[roadmap/tracker] Low precision MoE training
#2147 commented on
Aug 27, 2025 • 0 new comments -
Deprecate and remove subclass.py, dynamic_quant.py, and weight_only.py
#2745 commented on
Aug 27, 2025 • 0 new comments -
Deprecation for Float8DynamicActivationFloat8WeightConfig and Float8WeightOnlyConfig and the models
#2649 commented on
Aug 27, 2025 • 0 new comments -
Implement an AWQ algorithm with dynamic activation quantization for ExecuTorch
#2388 commented on
Aug 27, 2025 • 0 new comments -
[moe training] fsdp2 bug for llama4 shared experts where num_experts=1
#2673 commented on
Aug 26, 2025 • 0 new comments -
[not user facing] Split `implements` and `implements_torch_function`
#2707 commented on
Aug 24, 2025 • 0 new comments