Pulse · triton-lang/triton · GitHub

January 29, 2025 – February 5, 2025

Overview

77 Active pull requests

14 Active issues

64 Pull requests merged by 23 people

[mlir][dialect] Refactor DotLike trait into a DotOpInterface + Enable verification of scaled_dot
#5796 merged Feb 5, 2025
Bump actions/checkout from 3 to 4
#5714 merged Feb 5, 2025
Fix default num_stages values mismatch between Python frontend and MLIR.
#5804 merged Feb 5, 2025
[Blackwell] Enable MMA pipelining for scaled dot when TMEM copy is used
#5812 merged Feb 5, 2025
[Blackwell] Add support for mixed precision scaled dot
#5799 merged Feb 4, 2025
[LAYOUTS] Choose between wgmma RS and SS within AccelerateMatmul
#5798 merged Feb 4, 2025
[TUTORIAL] Apply remat perf fix to non-TMA persistent matmul
#5811 merged Feb 4, 2025
[INTERPRETER] Support tuple arguments in interpreter
#5790 merged Feb 4, 2025
[TEST] Use a fresh triton cache dir for warning tests
#5809 merged Feb 4, 2025
[LAYOUTS] Fix TransOp::fold
#5807 merged Feb 4, 2025
[PIPELINE] Remove outer loop pipelining transformation
#5766 merged Feb 4, 2025
[PROTON] Fix incorrect tmp_path initialization
#5803 merged Feb 4, 2025
[BACKEND] Disable ldmatrix.trans for fp8
#5800 merged Feb 4, 2025
[LAYOUTS] Don't hoist into ifs outside of loops
#5801 merged Feb 4, 2025
[LAYOUTS] Remove HoistLayoutConversion in favour of backwardsRemat
#5788 merged Feb 3, 2025
[NFC] Move element bit width into NVMMASharedEncoding
#5794 merged Feb 3, 2025
[INTERP] Support tensor descriptor ops
#5795 merged Feb 3, 2025
[PROTON] Parallelize proton tests
#5792 merged Feb 3, 2025
[PROTON-DEV] Improve profile interface
#5793 merged Feb 2, 2025
[AMD] NFC: Drop unused constructors in elementwise patterns
#5789 merged Feb 2, 2025
[BACKEND] Refactor shared memory layout representation
#5786 merged Feb 1, 2025
Use os.sep in filter_traceback function
#5781 merged Feb 1, 2025
Tutorial 09 Descriptor Kernel
#5779 merged Feb 1, 2025
[Proton] Fixed pc sampling error
#5787 merged Feb 1, 2025
[BC Breaking] Add output dtype to tl.sum with default
#5763 merged Feb 1, 2025
[Pipeliner] Fix condition for pipelining loads
#5780 merged Feb 1, 2025
[AMD] Add GFX950 fp32 to bf16 Conversion Ops
#5782 merged Feb 1, 2025
[BACKEND] bump llvm to ffe3129e9bdc146ee4d91e849173d1c64b1ae974
#5784 merged Feb 1, 2025
[Layouts] Remove sketchy remat condition
#5783 merged Feb 1, 2025
Do not reorder transpose of dot operand that is used in ops other than dotOp
#5686 merged Jan 31, 2025
[DEV] Don't use .ONESHELL in Makefile
#5775 merged Jan 31, 2025
[AMD] Emit AMD specific intrinsics for dot
#4594 merged Jan 31, 2025
[AMD] Rewrite canonicalize pointers to use 1:N conversion
#5329 merged Jan 31, 2025
[PROTON] Skip warnings caused by legacy clang compilers
#5778 merged Jan 31, 2025
Revert "[LAYOUTS] Generalise HoistLayoutConversion to work with arbit…
#5776 merged Jan 31, 2025
[TOOLS] Fixed bug in AOT compiler
#5771 merged Jan 31, 2025
Fix __builtin_clz implementation on Windows
#5774 merged Jan 31, 2025
[LAYOUTS] Generalise HoistLayoutConversion to work with arbitrary layouts and chains of ops
#5673 merged Jan 31, 2025
[AMD][BACKEND] Bugfix to small tile pingpong
#5759 merged Jan 31, 2025
[PROTON] Explicitly list all cpp files
#5756 merged Jan 31, 2025
[ANALYSIS][DEBUG] Output theoretical vs actual peak memory allocation size
#5658 merged Jan 31, 2025
[DRIVER] Pass correct SM and PTX versions to llvm
#5770 merged Jan 31, 2025
[Triton] Change xor_sum to use @jit (NFC)
#5769 merged Jan 31, 2025
[DOC] Update core maintainers list
#5767 merged Jan 30, 2025
Use env builtin implementation from LLVM's lit utility for platform independence
#5762 merged Jan 30, 2025
[PROTON] Add the -diff option to proton-viewer
#5740 merged Jan 30, 2025
[BACKEND] Canonicalize ReshapeOp even if not allowing reorder
#5752 merged Jan 30, 2025
[DEV] Unify Makefile and cuda CI commands
#5753 merged Jan 30, 2025
[PIPELINE] Limit number of buffers for register operands
#5755 merged Jan 30, 2025
Reapply "[Layouts] Propagate layouts into conditionals (#5610)"
#5725 merged Jan 30, 2025
[Proton][Dialect] Middle-end Proton operator definitions
#5754 merged Jan 30, 2025
Improve thread locality for reduction ops (#5671)
#5757 merged Jan 30, 2025
[PROTON] Reworked the mechanism for finding libraries for profiling backends.
#5751 merged Jan 30, 2025
[Frontend][Diagnostics] Improve emitting diagnostic information
#5581 merged Jan 30, 2025
[LAYOUTS] Create a trait that implements Layout equality by comparing the LLs
#5747 merged Jan 29, 2025
[BACKEND] Limit vector size to scratch size for convert_layout
#5746 merged Jan 29, 2025
[backend] NFC: Split architecture dependant and independant parts of FMA dot conversion
#5655 merged Jan 29, 2025
[BACKEND] bump to llvm/llvm-project@c118864223c6
#5684 merged Jan 29, 2025
Optimize reduce(reshape_1D)
#5748 merged Jan 29, 2025
[AMD][BACKEND] Disable pingpong with non-local_load input.
#5718 merged Jan 29, 2025
Revert "[PROTON] Prefer the default library path when loading profiler backends"
#5749 merged Jan 29, 2025
Revert "[Coalesce] Fix the default order to be row major "
#5744 merged Jan 29, 2025
[NVIDIA] Use correct commit type for TMA
#5738 merged Jan 29, 2025
[BACKEND] Deprecate SharedToDotOperandMMAv2OrV3.cpp
#5734 merged Jan 29, 2025

13 Pull requests opened by 11 people

[AMD] Initial support for LDS transpose load instructions
#5750 opened Jan 29, 2025
[AMD] Added `ConcatOp` to AMDGPU Dialect
#5760 opened Jan 30, 2025
[release/3.2.x] Get proper PTX version for CUDA >= 12.6
#5765 opened Jan 30, 2025
[OPTIMIZER] Fix insertion location in HoistLayoutConversion pattern
#5772 opened Jan 31, 2025
[WIP] [AMD] Specific swizzling pattern for TN GEMMs
#5797 opened Feb 3, 2025
[WIP][DNR][Pipeliner] Pipeline prologue/epilogue loads
#5802 opened Feb 4, 2025
[AMD] Moved membar analysis to its dedicated pass
#5805 opened Feb 4, 2025
[AMD] refactored instruction `sched.hint`
#5808 opened Feb 4, 2025
[PIPELINE] Relax requirements for wgmma operand register pipelining
#5810 opened Feb 4, 2025
[Blackwell][TUTORIALS] Add tutorial 10-block-scaled-matmul.py
#5813 opened Feb 5, 2025
[BACKEND] bump to llvm/llvm-project@ffe3129e9bdc
#5814 opened Feb 5, 2025
Copy local files if variable is specified
#5815 opened Feb 5, 2025
[Blackwell][Clean up] Remove use of SharedMemoryObject on TMEM
#5817 opened Feb 5, 2025

10 Issues closed by 5 people

histogram allows the creation of non-power-of-two tensor sizes
#4826 closed Feb 4, 2025
Assertion error from linear layouts
#4727 closed Feb 4, 2025
Where can i get the triton cp38 win version?
#4732 closed Feb 4, 2025
Blocksparse.matmul result does not align with torch
#4709 closed Feb 4, 2025
Assertion failure in LinearLayoutConversions on H100s when num_warps=8
#5609 closed Feb 4, 2025
jit issue when INTERPRETER=1
#5056 closed Feb 2, 2025
import te raise error
#5722 closed Jan 31, 2025
Assertion error when lowering a reduce->reshape->reshape->broadcast pattern to LLIR
#5745 closed Jan 29, 2025
Triton does not really enable -ftz
#5735 closed Jan 29, 2025
Not able to install dependencies file. : triton=3.0.0
#5741 closed Jan 29, 2025

4 Issues opened by 4 people

Potential Bug in **_attn_fwd_tma** Function
#5816 opened Feb 5, 2025
Triton interpreter cannot handle parameters that alias
#5791 opened Feb 2, 2025
"ImportError: cannot import name 'backends' from 'triton.backends' (unknown location)" for triton installed from source
#5773 opened Jan 31, 2025
Misreport "Cannot have `return` statements inside `while` or `for`" if values returned by function are disregarded
#5768 opened Jan 30, 2025

18 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

[AMD] AsyncCopyGlobalToLocal lowering to global.load.lds
#5729 commented on Feb 3, 2025 • 22 new comments
Fix assertion in ScanLowering for num_ctas>1
#5680 commented on Feb 5, 2025 • 3 new comments
[AMD] Enable pingpong scheduling by default
#5696 commented on Feb 1, 2025 • 2 new comments
[Proton][Dialect] Add Proton Device Memory Buffer Init and Allocate Pass
#5606 commented on Feb 5, 2025 • 2 new comments
[WIP][Pipeliner] Enable automatic loop fusion
#5726 commented on Feb 4, 2025 • 0 new comments
[WIP] Support shared encoding defined with linear layout
#5720 commented on Feb 1, 2025 • 0 new comments
[WIP][AMD] Add MFMA and WMMA layouts to LinearEncodingTest
#5698 commented on Jan 29, 2025 • 0 new comments
[Proton][Dialect] Middle-end support of the Proton Dialect and the frontend Python package
#5677 commented on Feb 5, 2025 • 0 new comments
[WIP] [AMD] Remove "remove unsupported conversions" pass
#5674 commented on Feb 3, 2025 • 0 new comments
Ensured that dtype subclasses are hashable
#5657 commented on Feb 4, 2025 • 0 new comments
[AMD] refactor convert buffer ops
#5563 commented on Feb 5, 2025 • 0 new comments
[MXFP] Implement SW emulation of dot_scale as a decomposition
#5475 commented on Feb 4, 2025 • 0 new comments
[WIP][SWP] Print recurring dependencies when reporting scheduling conflicts
#5375 commented on Feb 4, 2025 • 0 new comments
[AMD-Pipeline] Add multi-stage global/local prefetch
#5353 commented on Feb 5, 2025 • 0 new comments
Why is the documentation not versioned like other Read The Docs sites?
#4454 commented on Feb 4, 2025 • 0 new comments
[RFC] Improve performance for layer-norm in turtorial
#5712 commented on Feb 3, 2025 • 0 new comments
Is Triton unable to install in python 3.10 versions?
#1057 commented on Jan 31, 2025 • 0 new comments
[3.2.x] `ptx_get_version` cannot handle CUDA>12.6
#5737 commented on Jan 30, 2025 • 0 new comments