Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] main from openxla:main #5

Open
wants to merge 2,639 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
2639 commits
Select commit Hold shift + click to select a range
c97b084
Including the .kd symbol suffix in AMDGPU executables.
benvanik Jan 7, 2025
d517661
[runtime][python] Add debug sink to bindings (#19013)
sogartar Jan 7, 2025
1ccabe5
Adding COMPILER_TARGET_DEVICE to iree_hal_cts_test_suite.
benvanik Jan 7, 2025
c9fb739
Fixing HAL driver CTS test to not assume numerical indices exist.
benvanik Jan 7, 2025
a8f7a32
Adding iree_hal_queue_affinity_* utilities.
benvanik Jan 7, 2025
4a04c0a
Adding minor iree/base/ time, string view, and memory utilities.
benvanik Jan 7, 2025
ea462c8
Removing some IREE_RETURN_AND_END_ZONE_IF_ERROR usage that was ugly.
benvanik Jan 7, 2025
2199c1d
Adding iree_arena_block_pool_preallocate.
benvanik Jan 7, 2025
66723e4
Cleaning up null HAL driver options.
benvanik Jan 7, 2025
aa06523
[NFC] Comment fixes in iree_bitcode_library.
benvanik Jan 7, 2025
7047cc3
Rollup of minor runtime fixes/cleanup from the AMDGPU branch. (#19621)
benvanik Jan 7, 2025
9a83239
[GPU] Add chained reshape support for scf.forall expand destination p…
nirvedhmeshram Jan 7, 2025
349026b
Add explicit tolerances to SDXL benchmark test times. (#19628)
ScottTodd Jan 7, 2025
550d88e
[GPU] Add lowering configuration logic for scatter (#19624)
qedawkins Jan 7, 2025
80cbf6b
[GPU] Add a pass to convert accumulating GEMMs to GEMMs (#19587)
nirvedhmeshram Jan 7, 2025
a5c3879
Reapply "Propagate reshapes through generics with reduction… (#18968)
IanWood1 Jan 8, 2025
fb21dd6
Adding experimental Tracy API for TLS-less event recording. (#19625)
benvanik Jan 8, 2025
be75a30
Update minor Python versions used to build packages (#19632)
marbre Jan 8, 2025
7b9aa28
When dumping intermediates, dump how to reproduce the `.optimized.ll`…
bjacob Jan 8, 2025
c75b686
[GPU][Codegen] Allowing mfma for narrow problem config sizes (#19615)
jerryyin Jan 8, 2025
9b4906e
[DispatchCreation] Drop fusion restriction for stride != 1 conv (#19634)
qedawkins Jan 8, 2025
c484058
[GPU] Add barriers when resolving GPUMappedForall to fix race conditi…
nirvedhmeshram Jan 8, 2025
af416b3
Bump version to 3.2.0 after releasing 3.1.0. (#19638)
ScottTodd Jan 8, 2025
126f0ac
Add docs for updating release git tags manually. (#19637)
ScottTodd Jan 8, 2025
2347d9f
Supporting (and renaming) IREE_HAL_WHOLE_BUFFER in binding table reso…
benvanik Jan 8, 2025
74f8d3c
[LinalgExt] Scatter fusion by expansion 3/3 (#19588)
IanWood1 Jan 9, 2025
02d145e
[Stream] Implement SpecializeEncodings pass (1/n) (#19502)
hanhanW Jan 9, 2025
82e37d6
Fix (cross) compiling for 32-bit targets (#19644)
marbre Jan 9, 2025
9055c9d
[hip] Fix race in the cleanup of queue read operations. (#19645)
AWoloszyn Jan 9, 2025
a7bac5d
[Flow] Fix dispatch naming for dynamic shaped fusions (#19439)
qedawkins Jan 9, 2025
bb1c561
Erase all address spaces and get inlined ukernels (#19646)
bjacob Jan 9, 2025
6d6bd6e
[runtime] Fix runtime tracing compile failure on gcc (#19642)
IanWood1 Jan 9, 2025
b3ff1ed
Rename `unroll_{m,n,k}` to `intrinsics_{m,n,k}` (#19652)
bjacob Jan 9, 2025
7d21c5d
Revert (2nd) of "Propagate reshapes through generics with reduction" …
MaheshRavishankar Jan 9, 2025
801e2c1
Expand runtime_tracing job to include Windows and macOS. (#19655)
ScottTodd Jan 9, 2025
c793f90
[i1] Implement `packed_storage` layout encoding attribute (#19354)
lialan Jan 10, 2025
6245db1
[Stream] Attach layouts to tensor ops in encoding specialization pass…
hanhanW Jan 10, 2025
2aca091
[Codegen][Nearly NFC] Move PropagateDispatchSizeBounds to Common/ (#1…
krzysz00 Jan 10, 2025
e64cb12
Increase strictness of global isel use for ROCM (#19247)
tpopp Jan 10, 2025
a88555c
Add macOS workflow running on M1 (#19656)
marbre Jan 10, 2025
106371d
Bump torch-mlir to f92c587cb6150e73078f32cf847dc3892be16f93 (#19659)
jinchen62 Jan 10, 2025
1d91bec
Supporting file descriptors in iree_io_stream_open. (#19665)
benvanik Jan 10, 2025
039b8b4
Using tracy::GetQueue instead of the sketchy static variable referenc…
benvanik Jan 10, 2025
f7a2157
Remove Upcasting schedule from TileAndFuse (#19669)
nirvedhmeshram Jan 10, 2025
9f93691
[LLVMGPU] Use LLVMGPUDistribute for small input scatters (#19670)
qedawkins Jan 10, 2025
a583b25
[GPU] Teach GPUApplyTilingLevel PartialReduction tiling (#19682)
Groverkss Jan 12, 2025
1441caa
Enable macOS Tracy CI build. (#19668)
ScottTodd Jan 13, 2025
ae50c5e
[DOCS] Update VectorExt::NestedLayoutAttr docs (#19246)
manupak Jan 13, 2025
88d5f59
Update PkgCI test_amd to use MI300x conductor cluster (#19517)
yamiyysu Jan 13, 2025
40c19e3
Better support multidevice placement with `stream.async.barrier` (#19…
rsuderman Jan 13, 2025
cac7a96
Update IREE test suite to use iree-org/iree-test-suites@c47d13c (#19617)
MaheshRavishankar Jan 13, 2025
d90c505
Reshape propagation to enable broadcast(transpose) -> attention(q, kt…
MaheshRavishankar Jan 13, 2025
9b35412
Run on schedule in iree-org only (#19685)
marbre Jan 13, 2025
2b29155
Update GH actions with Dependabot (#19663)
marbre Jan 13, 2025
2452b22
[Codegen][GPU] Let integer range optimization narrow GPU computations…
krzysz00 Jan 13, 2025
3978ce6
Increase default threshold of TileLargeTensor pass (#19671)
nirvedhmeshram Jan 13, 2025
3e34e03
Bump the github-actions group with 8 updates (#19689)
dependabot[bot] Jan 13, 2025
158c636
Revert "Increase default threshold of TileLargeTensor pass (#19671)" …
nirvedhmeshram Jan 13, 2025
8d1d867
[GPU] Add thread tile size inference for scatter (#19694)
qedawkins Jan 14, 2025
21b0101
[GPU] Disable prefetching for loops with no computation (#19695)
nirvedhmeshram Jan 14, 2025
01c9f14
[LLVMGPUVectorDistribute] Add support for inter-subgroup multi_reduct…
manupak Jan 14, 2025
3c963dd
Update PyTorch sample notebooks using latest iree-turbine code. (#19658)
ScottTodd Jan 14, 2025
6fd0fd0
[LinalgExt] Implement PartialReductionOpInterface for OnlineAttention…
Groverkss Jan 14, 2025
c320935
Bump dawidd6/action-download-artifact from 3.1.4 to 7 in the github-a…
dependabot[bot] Jan 14, 2025
a953763
Temporarily Disable MI250 workflow due to machine outage (#19702)
iamakanshab Jan 14, 2025
27e7a90
[DT][Encoding] Use layouts to calculate storage size when it is prese…
hanhanW Jan 15, 2025
3c95042
Re-enable MI250 workflows. (#19705)
saienduri Jan 15, 2025
c285d58
Copy sample code into samples/dynamic_shapes/README.md. (#19699)
ScottTodd Jan 15, 2025
3032df2
Fix newlines in markdown mermaid.js diagrams. (#19657)
ScottTodd Jan 15, 2025
5ee9b27
Clean up encoding-related code. NFC. (#19717)
kuhar Jan 16, 2025
f6f6388
[Codegen] Add workgroups reordering to distribute using forall (#19681)
pashu123 Jan 16, 2025
3e8c81c
Remove legacy sync path (#19714)
rsuderman Jan 16, 2025
6f33cd4
[Stream] Specialize encoding for TensorPhaseOp that have result_encod…
hanhanW Jan 16, 2025
08b44e2
[hip] Try again to fix the semaphore busy loop. (#19712)
AWoloszyn Jan 16, 2025
36c2353
[Codegen] Push up the extract slice op (#19680)
pashu123 Jan 16, 2025
dde5992
[Codegen] Allow memref type propagation through collapse_shape (#19400)
Max191 Jan 17, 2025
c1cc4cc
[LLVMGPU] Add pass to distribute undistributed copies to threads (#19…
qedawkins Jan 17, 2025
75c9e86
[GPU] Avoid fusing slices of already tiled ops (#19404)
Max191 Jan 17, 2025
f31cc72
Update resource placement and transfer for barrier operations (#19725)
rsuderman Jan 17, 2025
e1f010c
[Dispach] Clone chain of ops into dispatch (#19723)
IanWood1 Jan 17, 2025
c5bc37f
[Util] Fix OptimizeIntArithmetic pattern failure condition (#19731)
Groverkss Jan 17, 2025
4d3f06a
[VectorDistribution] Clone vector.step on layout conflict (#19732)
Groverkss Jan 17, 2025
b08d152
[HAL] Use util.assume.int for memref alignments (#19691)
krzysz00 Jan 17, 2025
b4c7de6
[Python] Enable building Python bindings as editable wheels, document…
krzysz00 Jan 17, 2025
6052a1d
Bump to llvm/llvm-project@3f1486f (#19683)
MaheshRavishankar Jan 17, 2025
be8e3d2
Convert barriers into copies during allocation (#19735)
rsuderman Jan 18, 2025
db129e5
Update torch-mlir to llvm/torch-mlir@f42c7e4 (#19736)
zjgarvey Jan 20, 2025
a64d713
Implement ValueBoundsOpInterface on HAL ID/count ops, util.assume.int…
krzysz00 Jan 20, 2025
9aae362
[Codegen] Sprinkle in PropagateDispatchSizeBounds passes (#19677)
krzysz00 Jan 20, 2025
4c600ca
[Codegen] Refactor RemoveSingleIteratinLoop to use ValueBoundsOpInte…
krzysz00 Jan 20, 2025
88887e7
Add ubuntu-24.04-arm runtime and runtime_tracing CI jobs. (#19724)
ScottTodd Jan 20, 2025
8ba420d
Bump sarisia/actions-status-discord from 1.15.1 to 1.15.2 in the gith…
dependabot[bot] Jan 20, 2025
4a7af87
Integrate llvm 1_20_2025 (#19740)
nirvedhmeshram Jan 20, 2025
38ca3be
[GPU] Add SwapExpandShapeWithSlice pattern to loop fusion pass (#19729)
Max191 Jan 21, 2025
2e21a9a
[LLVMGPU] Enable scf.forall distr. on vectorDistribute Pipeline (#19420)
pashu123 Jan 21, 2025
e154e8b
[hip] Enable caching in the hip async allocator (#19667)
AWoloszyn Jan 21, 2025
21d5db3
[Codegen] Use linearize_index op when swapping slice and expand (#19730)
Max191 Jan 21, 2025
4c0ba9c
[DispatchCreation] Add constant expression hoisting (#19750)
qedawkins Jan 21, 2025
b47fbdf
[LinalgExt] Update scatter to allow dropping unit dims (#19704)
IanWood1 Jan 21, 2025
3e15a5a
[Global Opt] Add option to generalize matmul ops (#19741)
IanWood1 Jan 21, 2025
1cd62fd
[DT][GPU] Permute cross-thread dims of TileSwizzle to outermost (#19734)
Max191 Jan 21, 2025
eb21715
Bump StableHLO to openxla/stablehlo@23d7f60. (#19754)
ScottTodd Jan 21, 2025
8c7eeca
[DispatchCreation] Enable Rope computation fusion with attention. (#1…
MaheshRavishankar Jan 22, 2025
525389c
[LLVMGPU] Enable forall distr on the gpuvectorization pass pipeline (…
pashu123 Jan 22, 2025
cac390d
Fix or work around gcc-14 warnings/errors. (#19758)
ScottTodd Jan 22, 2025
26dcb8e
Fixes for CMake 3.31 policy changes. (#19759)
ScottTodd Jan 22, 2025
6933c39
[Codegen] add mi308x target (#19756)
bangtianliu Jan 22, 2025
03c5a0f
[LLVMGPUVectorDistribute] Refactor vector.contract distribute (#19631)
manupak Jan 22, 2025
ba30557
[runtime][hip] Cast IREE_[HOST|DEVICE]_MAX_SIZE to iree_host_size_t t…
hanhanW Jan 22, 2025
a430695
[GPU] Add pattern to fuse tensor.collapse_shape into forall producer …
Max191 Jan 22, 2025
278e63a
Move gcc dangling-reference check to GCC>=13. (#19772)
ScottTodd Jan 22, 2025
2cd88d5
Allow Scatter to have mismatched static/dynamic dims (#19774)
IanWood1 Jan 22, 2025
710c22a
[FLOW] Move InitializeEmptyTensors before CaptureDynamicDims (#19563)
zjgarvey Jan 23, 2025
02d3f46
[LinalgExt][NFC] Update scatter docs (#19755)
IanWood1 Jan 23, 2025
3af03b1
[Dispatch] Fix issue with bubbling of extract/expand shape (#19776)
IanWood1 Jan 23, 2025
d6b2b0d
Fixing broken debug line in ReferencePartitioning.
benvanik Jan 23, 2025
6aedfd3
[NFC][Vectorization] Refactor vector size inference out of the pass (…
manupak Jan 23, 2025
c04a013
[Preprocessing] Adding conv filter to channel last pass in preprocess…
jerryyin Jan 23, 2025
a7334f4
[LLVMGPU] Add PartialReduction tiling to LLVMGPUVectorDistribution pi…
Groverkss Jan 23, 2025
16739f3
[VectorDistribution] Allow f16 vector.multi_reduction distribution (#…
Groverkss Jan 23, 2025
9b26aef
[VectorExt] Add support for masking for toLayout vectorization (#19728)
manupak Jan 23, 2025
803cd86
[GPU] Preserve lowering_config after tiling in GPUApplyTilingLevel (#…
Groverkss Jan 23, 2025
1fa4b48
[Dispatch] Don't fuse when trunc is used by multiple contractions #19779
IanWood1 Jan 23, 2025
561ae02
[DT] Teach MaterializeEncoding pass to use resolved layouts, if any. …
hanhanW Jan 23, 2025
49eb5c5
[LLVMGPUVectorDistribute] add support for inferred dynamic shapes in …
manupak Jan 23, 2025
f431d4e
[hip] Set up the release callback for hip buffers. (#19787)
AWoloszyn Jan 23, 2025
74a2eda
Bump to LLVM with nanobind fix. (#19789)
marbre Jan 23, 2025
67879e8
Port compiler and iree-dialects to nanobind (#19790)
marbre Jan 23, 2025
6adf5b8
[DispatchCreation] Always clone attention mask generator (#19733)
Groverkss Jan 23, 2025
af54fd2
[Stream][NFC] Iterate over blocks to find return op (#19791)
IanWood1 Jan 23, 2025
c750b28
Add -U PyClassMethodNew (#19796)
marbre Jan 23, 2025
dd31890
[Stream][NFC] Move topological sort out of loop (#19794)
IanWood1 Jan 23, 2025
88e5467
[LLVMGPUVectorDistribute] Fix workgroup reduction to use in_bounds re…
manupak Jan 24, 2025
1a22a09
Include missing dependency to @llvm-project//mlir:Support (#19782)
pbarrera Jan 24, 2025
4215100
Skipping generic from root op when it computes slice indices (#19767)
jerryyin Jan 24, 2025
c52eb68
[LLVMGPU] Fix lowering of functions that don't use all bindings (#19773)
krzysz00 Jan 24, 2025
bbe7f5c
Adding `--iree-scheduling-initialization-mode=` flag. (#19778)
benvanik Jan 24, 2025
ec6e00b
[Codegen][Tuner] Add support for per-sku tuning spec (#19762)
bangtianliu Jan 24, 2025
ea599e2
Add SDXL regression test on mi308 (#19747)
IanNod Jan 25, 2025
6ec861f
Bump llvm to llvm/llvm-project@95d993a (#19811)
nirvedhmeshram Jan 25, 2025
9201e85
Verifying that vm.import ops have a module.func separator. (#19808)
benvanik Jan 25, 2025
5e4dc73
[ROCM] Add radeon pro workstation cards to known targets (#19815)
kuhar Jan 25, 2025
ccb0dea
Clean up named attribute construction. NFC. (#19813)
kuhar Jan 25, 2025
73a6307
[Codegen] Rename tuning application test flag (#19816)
kuhar Jan 25, 2025
1648af4
[docs] Overhaul amd gpu target options (#19814)
kuhar Jan 25, 2025
0082e27
[docs] Add workstation cards to amd gpu table (#19818)
kuhar Jan 25, 2025
ebb9615
[GPU] Force distribution along workgroups for scatter (#19784)
qedawkins Jan 27, 2025
3fdace5
[Dispatch] Enable fusing producers with scatter (#19775)
IanWood1 Jan 27, 2025
3f3c69b
[Codegen] Swap to IREE's extract_strided_subview patterns (#19757)
krzysz00 Jan 27, 2025
9a34131
Cherry-pick fix for torch-mlir build on MSVC. (#19823)
ScottTodd Jan 27, 2025
d72a7a1
Adding `stream.tensor.dispatch` op. (#19817)
benvanik Jan 27, 2025
1bf7249
[hip] Cleanup the dispatch thread before the rest of the device. (#19…
AWoloszyn Jan 27, 2025
32bb478
Bump llvm to llvm/llvm-project@aa34a6ab (#19824)
pashu123 Jan 27, 2025
7e6c5ec
Switch to upstream StablehloToLinalg code. (#19792)
ScottTodd Jan 27, 2025
4b0ca34
Support fusing broadcast transposes with attention (#19828)
IanWood1 Jan 28, 2025
103d631
[LLVMGPU] Correct the workgroup level tile sizes for WarpReduction (#…
pashu123 Jan 28, 2025
22b34b5
Bump the github-actions group with 2 updates (#19827)
dependabot[bot] Jan 28, 2025
86ad063
[VectorExt] make fold unit dims work with dynamic shape (#19771)
manupak Jan 28, 2025
aa9f8c5
Remove barriers post execution scheduling (#19742)
rsuderman Jan 28, 2025
ecd67d9
[GPU] Use affine.delinearize_index for MMA tiles and vector distribut…
krzysz00 Jan 28, 2025
6a5c12e
[GPU] Add pattern to fuse tensor.extract_slice into forall producer …
Max191 Jan 28, 2025
9870a6d
Revert "Support fusing broadcast transposes with attention" (#19835)
IanWood1 Jan 28, 2025
2f91d11
[NFC] Clarify comments in BubbleUpExpand shapes pass. (#19837)
MaheshRavishankar Jan 28, 2025
60c3c40
[GPU][Codegen] Setting padding size of external reduction dimensions …
jerryyin Jan 29, 2025
9cca17a
[GPU] Add transpose to set of generalized named ops (#19845)
Max191 Jan 29, 2025
60f3cc6
Disable `misc-use-anonymous-namespace` (#19831)
IanWood1 Jan 29, 2025
6e90b9f
Bump tracy to wolfpld/tracy@5479a42 and update local build system. (#…
ScottTodd Jan 29, 2025
3f713f5
[ROCm] Add mi325x to known targets (#19846)
kuhar Jan 29, 2025
50a7087
Add pattern to convert generic conv ops to IGEMM (#19798)
nirvedhmeshram Jan 29, 2025
36e7593
[GPU] Allow vectorization for dynamic shapes with inner static dims (…
nirvedhmeshram Jan 30, 2025
10e66bc
[infra] Run parameterized ONNX model tests across CPU, Vulkan, and HI…
ScottTodd Jan 31, 2025
b9555fc
Fixing typo in schedule allocation alias tracking. (#19869)
benvanik Jan 31, 2025
0159762
[Codegen][GPU] Finish splitting NV intrinsics from AMD ones (#19853)
qedawkins Jan 31, 2025
4693b1c
[AMDGPU] Use shared memory in multi_mma ukernel (#19786)
bjacob Jan 31, 2025
6a9f2a6
Add support for executable duplication in encoding specialization pas…
hanhanW Feb 3, 2025
84a1746
fix(iree-opt): add missing passes and dialects for CUDA (#19867)
chrsmcgrr Feb 3, 2025
1627c6b
Add links to new iree-turbine documentation website. (#19870)
ScottTodd Feb 3, 2025
427cdfe
[Encoding] Implement [Un]specialized encodings for testing purpose. (…
hanhanW Feb 3, 2025
e14d6cd
[CPU] Add support for parsing AArch64 cpu features. (#19881)
hanhanW Feb 3, 2025
78f312b
[Stream] Update executable functions in encoding specialization pass.…
hanhanW Feb 3, 2025
86244de
[Encoding][Codegen] Add initial pad encoding layout attrs (#19865)
kuhar Feb 3, 2025
aff23ae
[GPU] Skip tiling large transposes and copies (#19887)
qedawkins Feb 3, 2025
539606c
[NFC][GPU] Remove named op IGEMM lowering (#19885)
nirvedhmeshram Feb 3, 2025
d661efa
[GPU] Use tile and fuse for matmul after vector distribute by default…
nirvedhmeshram Feb 3, 2025
1ed6350
[Codegen] Use affine.delinearize_index in workgroup distribution (#19…
krzysz00 Feb 4, 2025
d96a3f0
[GPU] Match Tile And Fuse skinny matmul bail-out to Vector Distribute…
nirvedhmeshram Feb 4, 2025
4fffb0e
Fixing execution region result placement. (#19872)
benvanik Feb 4, 2025
d444ab4
[CPU] Aarch64: actually do `reserve-x18` (#19895)
bjacob Feb 4, 2025
2733115
Bump llvm to llvm/llvm-project@a58e774fba42e13aa00667d644e96b783fc914…
pashu123 Feb 4, 2025
d7c6c7b
[CPU] Remove casting math.powf from fp16 to fp32 (#19844)
ita9naiwa Feb 4, 2025
002e637
[NFC][Im2Col] Move im2col to affine.delinearize_index (#19840)
krzysz00 Feb 4, 2025
3fafd30
Fix mapping of CPU to CPU features on Arm64. (#19900)
bjacob Feb 4, 2025
82255c7
[Encoding] Implement sizeof for pad encoding layout attr (#19890)
kuhar Feb 4, 2025
eb19497
Enable ukernels on remaining aarch64 targets (#19901)
bjacob Feb 4, 2025
a808900
Better messages on unknown CPU names. (#19902)
bjacob Feb 4, 2025
69b3ebb
[NFC] Add individual switches to polynomial approximation (#19697)
lialan Feb 4, 2025
4923c53
Bump the github-actions group with 2 updates (#19894)
dependabot[bot] Feb 4, 2025
13c2964
Silence false-positive warnings about implicit CPU fallback. (#19907)
bjacob Feb 5, 2025
dc4e900
[Util] Update Utils.cpp to address compilation error (#17593) (#19908)
ita9naiwa Feb 5, 2025
56bb652
Disable Attention V operand transposition. (#19810)
MaheshRavishankar Feb 5, 2025
0a2862c
[NFC][GPU] Simplify definitions of MMA attributes (#19905)
qedawkins Feb 5, 2025
3dfc486
Don't hoist bit-extend op as a leaf (#19871)
IanWood1 Feb 5, 2025
f3bef2d
Refresh TFLite guide with working code and TOSA status. (#19916)
ScottTodd Feb 5, 2025
86b845b
Bump llvm to llvm/llvm-project@c06d0ff806b7 (#19903)
hanhanW Feb 6, 2025
dfdc065
[Codegen][LLVMGPU] Add pass to materialize pad encoding (#19909)
kuhar Feb 6, 2025
d2c4d83
[runtime][python] add f8 element types (#19928)
sogartar Feb 6, 2025
25ec84c
[Integrate] Add arg/res_attrs to ops that implement CallOpInterface. …
hanhanW Feb 6, 2025
da4bdca
Refresh compile/run examples in deployment configuration guides. (#19…
ScottTodd Feb 6, 2025
ac46df5
[CUDA][Integrate] Switch CUDATarget to ptx_kernel cc. (#19925)
hanhanW Feb 6, 2025
5f7b471
[Stream] Add layouts to encodings for all stream tensor AffinityOp. (…
hanhanW Feb 6, 2025
535c063
Integrate llvm to llvm/llvm-project@e470dcae8d2c41 (#19927)
hanhanW Feb 7, 2025
b5b943a
Integrate llvm to llvm/llvm-project@a1984ec5eab09f (#19934)
hanhanW Feb 7, 2025
c2e13e8
[Codegen][GPU] Also don't tile large fills (#19937)
qedawkins Feb 7, 2025
0781072
LLVM: cherry-pick 73f11ac (#19939)
bjacob Feb 8, 2025
e4c683f
Integrate LLVM at 73f11ac (#19941)
bjacob Feb 10, 2025
21234ed
[Codegen][GPU] Infer workgroup size multiples from producers and cons…
Max191 Feb 10, 2025
624a9fa
[Preprocessing] Fix bug in TD dag matching op (#19945)
Max191 Feb 10, 2025
458efb4
Skip flaky iree-dump-parameters test on Windows. (#19915)
ScottTodd Feb 10, 2025
31fc602
Enable CUDA and ROCm backends in ci_linux_x64_clang.yml. (#19883)
ScottTodd Feb 10, 2025
364cb0a
Integrate LLVM at af2d82 (#19946)
bjacob Feb 10, 2025
91f59ee
Enable iree-test-deps on macos-14 (#19940)
hanhanW Feb 10, 2025
d81bb13
Bump version to 3.3.0 after releasing 3.2.0. (#19951)
ScottTodd Feb 10, 2025
4bc495b
[Codegen] Always use ? for non-zero offsets (#19952)
krzysz00 Feb 11, 2025
3c0ff87
Integrate LLVM at 001ba42f (#19954)
bjacob Feb 11, 2025
93eb7c8
[LLVMGPU] Add initial kernel config for horizontally fused gemms. (#1…
MaheshRavishankar Feb 11, 2025
0a5ea45
[NFC] Remove unused CAPI dependencies added by #18840 (#19949)
fabianmcg Feb 11, 2025
8db4e38
Enable ccache for PJRT pkgci workflow (#19944)
PragmaTwice Feb 11, 2025
e5a5881
Fixes for parentheses warnings on gcc/clang. (#19957)
ScottTodd Feb 11, 2025
0b6af4e
[Stream] Remove unused variables (i.e., dead code) and clean up tests…
hanhanW Feb 11, 2025
3fce185
Disable ubuntu-24.04-arm workflows until the runners are stable. (#19…
ScottTodd Feb 12, 2025
7c0259c
Refactor `PolynomialApproximationPass` into `MathTransformPass`. (#19…
bjacob Feb 12, 2025
06eaead
[Codegen] Run LoopCoalescingPass at the end of warp reduce (#19950)
IanWood1 Feb 12, 2025
99304ff
[Flow] Fix cloning of `flow.tensor.transfer` into dispatch (#19838)
IanWood1 Feb 12, 2025
d3cfe11
[GPU] Set insertion point to last slice index operand in reshape and …
Max191 Feb 12, 2025
d11b876
[Stream] Enable batch affinity queries in SpecializeEncoding pass. (#…
hanhanW Feb 12, 2025
5767be3
Reland "Support fusing broadcast transposes with attention" (#19962)
IanWood1 Feb 12, 2025
73be116
[LLVMGPU] Pass to decompose horizontally fused GEMMs before layout co…
MaheshRavishankar Feb 12, 2025
04dc4a4
Integrate torch-mlir at c9694c6 and disable TOSA. (#19976)
bjacob Feb 13, 2025
eb58f82
[LLVMGPU] Add fixes and tests for horizontally fused gemms through GP…
MaheshRavishankar Feb 13, 2025
78ec7f2
[DispatchCreation] Changes to dispatch region in preparation for hori…
MaheshRavishankar Feb 13, 2025
ecfe2b0
Integrate LLVM at 3e223e3 (#19978)
bjacob Feb 13, 2025
32cfabf
[Dispatch] Fix error in FuseMultiUseElementwiseProducerPass (#19977)
IanWood1 Feb 13, 2025
863e705
[LLVMGPU] Support masked contraction in operand upcasting (#19972)
manupak Feb 13, 2025
8fab35c
Drop nanobind workaround (#19912)
marbre Feb 13, 2025
0ff26a7
[Codegen] Add support to emulate unsupported float type (#19943)
pashu123 Feb 13, 2025
f6481fb
[VM] Add support for UI64 to F32 casts (#19556)
zjgarvey Feb 13, 2025
51352e0
[SPIRV] Fix softmax compilation failure (#19985)
IanWood1 Feb 13, 2025
b31b408
Integrate torch-mlir at aa74936c (#19979)
bjacob Feb 14, 2025
b3ef1d5
Integrate LLVM at a57e58d (#19990)
bjacob Feb 14, 2025
756e9e6
[Encoding][NFC] Refactor dropEncoding method to "EncodingTypes.h". (#…
hanhanW Feb 14, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
[Codegen][Tuner] skip linking based on the default entry point attrib…
…ute (iree-org#19603)

This PR generalizes the cases in which the linking pass can be skipped
based on the presence of the default entry point attribute.

---------

Signed-off-by: Bangtian Liu <[email protected]>
  • Loading branch information
bangtianliu authored Jan 6, 2025
commit 763406f9c9b5b02cd9b0c9a356ca9848a9685c4f
Original file line number Diff line number Diff line change
Expand Up @@ -197,14 +197,30 @@ struct MaterializeTuningSpecsPass final
return;
}

// If only the default tuning spec is available, use it directly and skip
// the linking stage.
if (!hasUserTuningSpec) {
if (failed(dumpFinalTuningSpecToDir(*defaultTuningSpec))) {
// When the user tuning spec and default spec are available, link all
// available libraries into a single module. We insert the default tuning
// spec last, so that any user-specified tuning configurations take
// precedence.
SmallVector<ModuleOp, 2> allSpecs;
if (hasUserTuningSpec) {
allSpecs.push_back(*userTuningSpec);
}
if (hasDefaultTuningSpec) {
allSpecs.push_back(*defaultTuningSpec);
}

// Determine if the linking pass should be skipped.
// Skip if there is only one tuning spec (either user-provided or default)
// with the default attribute.
if (allSpecs.size() == 1 &&
allSpecs[0]->hasAttr(kTuningSpecDefaultEntrypointAttrName)) {
// Use the appropriate tuning spec (user or default).
ModuleOp tuningSpecWithDefaultAttr = allSpecs[0];
if (failed(dumpFinalTuningSpecToDir(tuningSpecWithDefaultAttr))) {
return signalPassFailure();
}
FailureOr<DenseElementsAttr> serializedSpec =
serializeTuningSpecToAttr(*defaultTuningSpec);
serializeTuningSpecToAttr(tuningSpecWithDefaultAttr);
if (failed(serializedSpec)) {
module->emitError("Failed to serialize default tuning specs");
return signalPassFailure();
Expand All @@ -213,14 +229,6 @@ struct MaterializeTuningSpecsPass final
return;
}

// When the user tuning spec is available, link all available libraries into
// a single module. We insert the default tuning spec last, so that any
// user-specified tuning configurations take precedence.
SmallVector<ModuleOp, 2> allSpecs = {*userTuningSpec};
if (hasDefaultTuningSpec) {
allSpecs.push_back(*defaultTuningSpec);
}

Location loc =
FusedLoc::get(ctx, llvm::map_to_vector<2>(allSpecs, [](ModuleOp m) {
return m.getLoc();
Expand Down
2 changes: 2 additions & 0 deletions compiler/src/iree/compiler/Codegen/Common/test/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@ iree_lit_test_suite(
"reductions_codegen_spec.mlir",
"reductions_match_spec.mlir",
"tuning_spec.mlir",
"tuning_spec_default.mlir",
],
),
cfg = "//compiler:lit.cfg.py",
Expand All @@ -118,6 +119,7 @@ iree_lit_test_suite(
"reductions_codegen_spec.mlir",
"reductions_match_spec.mlir",
"tuning_spec.mlir",
"tuning_spec_default.mlir",
],
tools = [
"//tools:iree-opt",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,7 @@ iree_lit_test_suite(
reductions_codegen_spec.mlir
reductions_match_spec.mlir
tuning_spec.mlir
tuning_spec_default.mlir
)

### BAZEL_TO_CMAKE_PRESERVES_ALL_CONTENT_BELOW_THIS_LINE ###
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,11 @@
// RUN: --iree-codegen-dump-tuning-specs-to=- \
// RUN: --mlir-disable-threading --no-implicit-module %s | FileCheck %s

// RUN: iree-opt --pass-pipeline='builtin.module(iree-codegen-materialize-tuning-specs)' \
// RUN: --iree-codegen-tuning-spec-path=%p/tuning_spec_default.mlir \
// RUN: --iree-codegen-dump-tuning-specs-to=- \
// RUN: --mlir-disable-threading --no-implicit-module %s | FileCheck %s --check-prefix=SKIPLINK

// Check that the final tuning spec is as expected when the user tuning spec is provided.

// CHECK-LABEL: module @iree_linked_tuning_spec
Expand All @@ -19,6 +24,17 @@
// CHECK-SAME: iree_codegen.tuning_spec_mlirbc = dense<{{.+}}> : vector<{{[0-9]+}}xi8>
// CHECK-LABEL: func.func @main_0


// CHECK that the user-provided tuning spec is materized without linking when default tuing spec
// is missing and the user-provided tuning spec is marked the default attribute.

// SKIPLINK-LABEL: module @user_spec
// SKIPLINK-SAME: iree_codegen.tuning_spec_with_default_entrypoint
// SKIPLINK-SAME: transform.with_named_sequence
// SKIPLINK-NOT: module @{{.+}}
// SKIPLINK: module attributes
// SKIPLINK-SAME: iree_codegen.tuning_spec_mlirbc = dense<{{.+}}> : vector<{{[0-9]+}}xi8>
// SKIPLINK-LABEL: func.func @main_0
module {
func.func @main_0() {
return
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
// RUN: iree-opt %s

module @user_spec attributes { transform.with_named_sequence, iree_codegen.tuning_spec_with_default_entrypoint } {
transform.named_sequence @__kernel_config(%arg0: !transform.any_op {transform.readonly}) -> !transform.any_op
attributes { iree_codegen.tuning_spec_entrypoint } {
transform.print {name = "Hello Tuning Spec", skip_regions}
transform.yield %arg0 : !transform.any_op
}
}