-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Insights: triton-lang/triton
Overview
Could not load contribution data
Please try again later
64 Pull requests merged by 23 people
-
[mlir][dialect] Refactor DotLike trait into a DotOpInterface + Enable verification of scaled_dot
#5796 merged
Feb 5, 2025 -
Bump actions/checkout from 3 to 4
#5714 merged
Feb 5, 2025 -
Fix default num_stages values mismatch between Python frontend and MLIR.
#5804 merged
Feb 5, 2025 -
[Blackwell] Enable MMA pipelining for scaled dot when TMEM copy is used
#5812 merged
Feb 5, 2025 -
[Blackwell] Add support for mixed precision scaled dot
#5799 merged
Feb 4, 2025 -
[LAYOUTS] Choose between wgmma RS and SS within AccelerateMatmul
#5798 merged
Feb 4, 2025 -
[TUTORIAL] Apply remat perf fix to non-TMA persistent matmul
#5811 merged
Feb 4, 2025 -
[INTERPRETER] Support tuple arguments in interpreter
#5790 merged
Feb 4, 2025 -
[TEST] Use a fresh triton cache dir for warning tests
#5809 merged
Feb 4, 2025 -
[LAYOUTS] Fix TransOp::fold
#5807 merged
Feb 4, 2025 -
[PIPELINE] Remove outer loop pipelining transformation
#5766 merged
Feb 4, 2025 -
[PROTON] Fix incorrect tmp_path initialization
#5803 merged
Feb 4, 2025 -
[BACKEND] Disable
ldmatrix.trans
for fp8#5800 merged
Feb 4, 2025 -
[LAYOUTS] Don't hoist into ifs outside of loops
#5801 merged
Feb 4, 2025 -
[LAYOUTS] Remove HoistLayoutConversion in favour of backwardsRemat
#5788 merged
Feb 3, 2025 -
[NFC] Move element bit width into NVMMASharedEncoding
#5794 merged
Feb 3, 2025 -
[INTERP] Support tensor descriptor ops
#5795 merged
Feb 3, 2025 -
[PROTON] Parallelize proton tests
#5792 merged
Feb 3, 2025 -
[PROTON-DEV] Improve profile interface
#5793 merged
Feb 2, 2025 -
[AMD] NFC: Drop unused constructors in elementwise patterns
#5789 merged
Feb 2, 2025 -
[BACKEND] Refactor shared memory layout representation
#5786 merged
Feb 1, 2025 -
Use
os.sep
infilter_traceback
function#5781 merged
Feb 1, 2025 -
Tutorial 09 Descriptor Kernel
#5779 merged
Feb 1, 2025 -
[Proton] Fixed pc sampling error
#5787 merged
Feb 1, 2025 -
[BC Breaking] Add output dtype to
tl.sum
with default#5763 merged
Feb 1, 2025 -
[Pipeliner] Fix condition for pipelining loads
#5780 merged
Feb 1, 2025 -
[AMD] Add GFX950 fp32 to bf16 Conversion Ops
#5782 merged
Feb 1, 2025 -
[BACKEND] bump llvm to ffe3129e9bdc146ee4d91e849173d1c64b1ae974
#5784 merged
Feb 1, 2025 -
[Layouts] Remove sketchy remat condition
#5783 merged
Feb 1, 2025 -
Do not reorder transpose of dot operand that is used in ops other than dotOp
#5686 merged
Jan 31, 2025 -
[DEV] Don't use .ONESHELL in Makefile
#5775 merged
Jan 31, 2025 -
[AMD] Emit AMD specific intrinsics for dot
#4594 merged
Jan 31, 2025 -
[AMD] Rewrite canonicalize pointers to use 1:N conversion
#5329 merged
Jan 31, 2025 -
[PROTON] Skip warnings caused by legacy clang compilers
#5778 merged
Jan 31, 2025 -
Revert "[LAYOUTS] Generalise HoistLayoutConversion to work with arbit…
#5776 merged
Jan 31, 2025 -
[TOOLS] Fixed bug in AOT compiler
#5771 merged
Jan 31, 2025 -
Fix
__builtin_clz
implementation on Windows#5774 merged
Jan 31, 2025 -
[LAYOUTS] Generalise HoistLayoutConversion to work with arbitrary layouts and chains of ops
#5673 merged
Jan 31, 2025 -
[AMD][BACKEND] Bugfix to small tile pingpong
#5759 merged
Jan 31, 2025 -
[PROTON] Explicitly list all
cpp
files#5756 merged
Jan 31, 2025 -
[ANALYSIS][DEBUG] Output theoretical vs actual peak memory allocation size
#5658 merged
Jan 31, 2025 -
[DRIVER] Pass correct SM and PTX versions to llvm
#5770 merged
Jan 31, 2025 -
[Triton] Change
xor_sum
to use@jit
(NFC)#5769 merged
Jan 31, 2025 -
[DOC] Update core maintainers list
#5767 merged
Jan 30, 2025 -
Use
env
builtin implementation from LLVM's lit utility for platform independence#5762 merged
Jan 30, 2025 -
[PROTON] Add the
-diff
option toproton-viewer
#5740 merged
Jan 30, 2025 -
[BACKEND] Canonicalize ReshapeOp even if not allowing reorder
#5752 merged
Jan 30, 2025 -
[DEV] Unify Makefile and cuda CI commands
#5753 merged
Jan 30, 2025 -
[PIPELINE] Limit number of buffers for register operands
#5755 merged
Jan 30, 2025 -
Reapply "[Layouts] Propagate layouts into conditionals (#5610)"
#5725 merged
Jan 30, 2025 -
[Proton][Dialect] Middle-end Proton operator definitions
#5754 merged
Jan 30, 2025 -
Improve thread locality for reduction ops (#5671)
#5757 merged
Jan 30, 2025 -
[PROTON] Reworked the mechanism for finding libraries for profiling backends.
#5751 merged
Jan 30, 2025 -
[Frontend][Diagnostics] Improve emitting diagnostic information
#5581 merged
Jan 30, 2025 -
[LAYOUTS] Create a trait that implements Layout equality by comparing the LLs
#5747 merged
Jan 29, 2025 -
[BACKEND] Limit vector size to scratch size for convert_layout
#5746 merged
Jan 29, 2025 -
[backend] NFC: Split architecture dependant and independant parts of FMA dot conversion
#5655 merged
Jan 29, 2025 -
[BACKEND] bump to llvm/llvm-project@c118864223c6
#5684 merged
Jan 29, 2025 -
Optimize reduce(reshape_1D)
#5748 merged
Jan 29, 2025 -
[AMD][BACKEND] Disable pingpong with non-local_load input.
#5718 merged
Jan 29, 2025 -
Revert "[PROTON] Prefer the default library path when loading profiler backends"
#5749 merged
Jan 29, 2025 -
Revert "[Coalesce] Fix the default order to be row major "
#5744 merged
Jan 29, 2025 -
[NVIDIA] Use correct commit type for TMA
#5738 merged
Jan 29, 2025 -
[BACKEND] Deprecate
SharedToDotOperandMMAv2OrV3.cpp
#5734 merged
Jan 29, 2025
13 Pull requests opened by 11 people
-
[AMD] Initial support for LDS transpose load instructions
#5750 opened
Jan 29, 2025 -
[AMD] Added `ConcatOp` to AMDGPU Dialect
#5760 opened
Jan 30, 2025 -
[release/3.2.x] Get proper PTX version for CUDA >= 12.6
#5765 opened
Jan 30, 2025 -
[OPTIMIZER] Fix insertion location in HoistLayoutConversion pattern
#5772 opened
Jan 31, 2025 -
[WIP] [AMD] Specific swizzling pattern for TN GEMMs
#5797 opened
Feb 3, 2025 -
[WIP][DNR][Pipeliner] Pipeline prologue/epilogue loads
#5802 opened
Feb 4, 2025 -
[AMD] Moved membar analysis to its dedicated pass
#5805 opened
Feb 4, 2025 -
[AMD] refactored instruction `sched.hint`
#5808 opened
Feb 4, 2025 -
[PIPELINE] Relax requirements for wgmma operand register pipelining
#5810 opened
Feb 4, 2025 -
[Blackwell][TUTORIALS] Add tutorial 10-block-scaled-matmul.py
#5813 opened
Feb 5, 2025 -
[BACKEND] bump to llvm/llvm-project@ffe3129e9bdc
#5814 opened
Feb 5, 2025 -
Copy local files if variable is specified
#5815 opened
Feb 5, 2025 -
[Blackwell][Clean up] Remove use of SharedMemoryObject on TMEM
#5817 opened
Feb 5, 2025
10 Issues closed by 5 people
-
histogram allows the creation of non-power-of-two tensor sizes
#4826 closed
Feb 4, 2025 -
Assertion error from linear layouts
#4727 closed
Feb 4, 2025 -
Where can i get the triton cp38 win version?
#4732 closed
Feb 4, 2025 -
Blocksparse.matmul result does not align with torch
#4709 closed
Feb 4, 2025 -
Assertion failure in LinearLayoutConversions on H100s when num_warps=8
#5609 closed
Feb 4, 2025 -
jit issue when INTERPRETER=1
#5056 closed
Feb 2, 2025 -
import te raise error
#5722 closed
Jan 31, 2025 -
Assertion error when lowering a reduce->reshape->reshape->broadcast pattern to LLIR
#5745 closed
Jan 29, 2025 -
Triton does not really enable -ftz
#5735 closed
Jan 29, 2025 -
Not able to install dependencies file. : triton=3.0.0
#5741 closed
Jan 29, 2025
4 Issues opened by 4 people
-
Potential Bug in **_attn_fwd_tma** Function
#5816 opened
Feb 5, 2025 -
Triton interpreter cannot handle parameters that alias
#5791 opened
Feb 2, 2025
18 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[AMD] AsyncCopyGlobalToLocal lowering to global.load.lds
#5729 commented on
Feb 3, 2025 • 22 new comments -
Fix assertion in ScanLowering for num_ctas>1
#5680 commented on
Feb 5, 2025 • 3 new comments -
[AMD] Enable pingpong scheduling by default
#5696 commented on
Feb 1, 2025 • 2 new comments -
[Proton][Dialect] Add Proton Device Memory Buffer Init and Allocate Pass
#5606 commented on
Feb 5, 2025 • 2 new comments -
[WIP][Pipeliner] Enable automatic loop fusion
#5726 commented on
Feb 4, 2025 • 0 new comments -
[WIP] Support shared encoding defined with linear layout
#5720 commented on
Feb 1, 2025 • 0 new comments -
[WIP][AMD] Add MFMA and WMMA layouts to LinearEncodingTest
#5698 commented on
Jan 29, 2025 • 0 new comments -
[Proton][Dialect] Middle-end support of the Proton Dialect and the frontend Python package
#5677 commented on
Feb 5, 2025 • 0 new comments -
[WIP] [AMD] Remove "remove unsupported conversions" pass
#5674 commented on
Feb 3, 2025 • 0 new comments -
Ensured that dtype subclasses are hashable
#5657 commented on
Feb 4, 2025 • 0 new comments -
[AMD] refactor convert buffer ops
#5563 commented on
Feb 5, 2025 • 0 new comments -
[MXFP] Implement SW emulation of dot_scale as a decomposition
#5475 commented on
Feb 4, 2025 • 0 new comments -
[WIP][SWP] Print recurring dependencies when reporting scheduling conflicts
#5375 commented on
Feb 4, 2025 • 0 new comments -
[AMD-Pipeline] Add multi-stage global/local prefetch
#5353 commented on
Feb 5, 2025 • 0 new comments -
Why is the documentation not versioned like other Read The Docs sites?
#4454 commented on
Feb 4, 2025 • 0 new comments -
[RFC] Improve performance for layer-norm in turtorial
#5712 commented on
Feb 3, 2025 • 0 new comments -
Is Triton unable to install in python 3.10 versions?
#1057 commented on
Jan 31, 2025 • 0 new comments -
[3.2.x] `ptx_get_version` cannot handle CUDA>12.6
#5737 commented on
Jan 30, 2025 • 0 new comments