Releases · hidet-org/hidet

21 Dec 21:47

vadiklyutiy

v0.5.0

8db9f39

Hidet v0.5.0 Latest

Latest

What's Changed

[BUG] Add comp server requirements (#661) by Vadim Gimpelson 300fd33
[BUG] A number of fixes for vllm's TP (#651) by Vadim Gimpelson 9c29f66
matmul_f16 with wgmma (#627) by kjiang170 9f0ea7d
[BUG] VLLM (and DMWL) compile with hidet backend (#647) by zhumakhan 6c6be7a
[IR] Add support for swizzle, interleave and l2Promotion in tensor map creation (#643) by Bolin Sun 21ff63f
[BUG] fix attach hash to signature (#638) by xiaocenxiaocen dbd6613
Hexcute base branch (All related PRs will be merged into this base PR. ) (#294) by xiaocenxiaocen b1fdf17
[PERF] Default value for parallel_k is 'disabled' (#634) by Vadim Gimpelson 135212b
Adapt to bfloat16 where necessary (#624) by ZichuWu 9045865
[Bug] Parallel compilation sync (#616) by ZichuWu 4c16c57
[COMPTIME] Hot start speedup (#625) by Vadim Gimpelson 22c657b
[BUG] Fix torch2.5 OoM and docs build fix (#637) by zhumakhan bf32f8b
Revert "[BUG] Fix torch2.5 OoM issue" (#635) by zhumakhan 9131a5c
[BUG] Fix torch2.5 OoM issue (#609) by zhumakhan fe59c63
[CI]Fix small typoes for building and publishing to internal Hidet PYPI Index (#598) by xinli-centml f8400fe
[PERF] Support bf16 in one more place (#623) by Vadim Gimpelson 7f77349
[Tests] Adapt tests/operators for bfloat16 (#615) by ZichuWu ba9c0ad
[DISTRIBUTED] Support all_reduce in torch.compile mode (#612) by Vadim Gimpelson 0bca591
[torchAPI] Inherit cuda stream from torch (#618) by Vadim Gimpelson ad4e00a
[BUG] Fix bugs in shared map implementation (#608) by Vadim Gimpelson ffdbde4
[CI] Turn off search space 2 for tests/lang (#617) by ZichuWu 5f7fae8
[Tests] Adapt tests/lang for bfloat16 test cases (#594) by ZichuWu 5b829cb
[Tests] Adapt tests/frontends to bfloat16 (#592) by ZichuWu a5b72e6
[Tests] Adapt tests/ir for bfloat16 test cases (#593) by ZichuWu 545aeea
[Tests] Adjust test cases for tests/models for bfloat16. (#595) by ZichuWu bedff21
Use one global cuda workspace for all the CompiledGraph (#603) by Max Hu 6652307
[Fix] Fixing a minor mistake encountered while adapting test cases for bfloat16 data type (#607) by Bolin Sun 275070d
Kaihang/wgmma tf32 u8 i8 support (#549) by kjiang170 a0e6658
[CI] Exclude tests/unit_tests/test_dynamic_shape.py::test_attention[cuda] (#606) by Vadim Gimpelson 5579392
[Tests] Adjust test cases for tests/unit-tests for bfloat16. (#596) by ZichuWu 0e5ec55
[BUG] Fix incorrect converting fxgraph to hidet's flow graph + expand looking for nccl lib with user site packages (#604) by Vadim Gimpelson 1995d43
[Tests] Added bfloat16 test cases for tests/cuda (#590) by ZichuWu febfbd7
[Tests] Adjust test cases for tests/utils for bfloat16. (#597) by ZichuWu 36aab6f
[Tests] Change float16 to bfloat16 for tests/apps (#589) by ZichuWu 83cddbb
[CI] add new github actions workflow to manually build and push to internal pypi index (#554) by xinli-centml 6beffab
[OPTIONS] Remove unnecessary parallel_k (#572) by ZichuWu 9051f26
fix test_wgmma.py error for illegal warp address (#588) by kjiang170 8f7e139
[Operators] Allow NT matmul layout for bfloat16 data type (#562) by Bolin Sun d5d0e51
python3.8 -> python3.9 (#558) by Vadim Gimpelson a09713c
[CI] Move import torch inside run_torch() (#570) by ZichuWu 4bc4d29
[CI] Shorten build-docs run time (#565) by ZichuWu edadb07
[CI] Tests Workflow. Add manual trigger of tests on different gpu types (#555) by c-fteixeira 66d9568
[OPTIONS] Clean Huggingface tokens option (#561) by ZichuWu cdf2c8a
[Bug] Fix out of memory error occurred while running llama-2-7b (#547) by Bolin Sun b8826d0
[OPTIONS] Set mma as default in PassContext() (#530) by ZichuWu 35f02b9
wgmma bf16 support (#531) by kjiang170 f8c057b
[Bug] ‘uint32_t’ was not declared in this scope in CI build-wheel for runtime (#545) by ZichuWu 4ced47e
Add more shapes to reduce op in regression (#534) by zhumakhan 8ef1bc2
[COMPTIME] Added support for run_torch for the rest of transform operation (#525) by ZichuWu 04e4d5e
f16 rest options supported and tested (#527) by kjiang170 e5e2404
[Operators] bfloat16 data type support for attention operators (#524) by Bolin Sun 07e597a
[Enhancement] Save running time by using symbolic_run to replace async_run in optimize (#490) by ZichuWu 92c81e8
[BUG] Fix distilbert by changing variables names in ops.where (#512) by zhumakhan 2d615b6
[OP] Support of logsoftmax (#517) by Vadim Gimpelson ce43f1e
refactor wgmma (#521) by kjiang170 4a80b9a
[Bug] Fix the incorrect result after merging changes related to matmul_nt (#518) by Bolin Sun 2b7c348
[PERF] Rewrite softmax (#516) by Vadim Gimpelson b50cca4
wgmma instruction support and test for f16 input … (#499) by kjiang170 c758e54
[BUG] Fix NT matmul corner case where n or k dimension is odd (#513) by Bolin Sun 1e54f77
[Operators] Support bfloat16 data type in matmul operator (#511) by Bolin Sun a467c76
[Operators] Support matmul with NT layout (#496) by Bolin Sun 8fc6de3
[CI] Make test and publish workflows use built wheel on tests (#492) by c-fteixeira bc5b54e
[Hidet Script] Import externally defined function automatically (#503) by Yaoyao Ding 43750c2
[PERF] Fix for indexes optimization (#488) by Vadim Gimpelson f8c679a
[CI] Update the set of Regression tests (#493) by Vadim Gimpelson 7e3ae1f
[Enhancement] Causal attention with fp32 accumulator (#481) by zhumakhan 8b569bd
[IR] Bound check for task mapping worker (#483) by Vadim Gimpelson 1544cdf
[Bug] Rule based simplifier. Fix incorrect rule e/c1/c2 -> e/(c1*c2) (#487) by Vadim Gimpelson fd6b439
[TOOLS] Task benchmark utilities (#479) by Vadim Gimpelson dc175f2
[Dynamic][Enhancement] Convert div and mod including symbolvars to fast int div/mod (#464) by Max Hu c8d9158
Revert accidental commit (#484) by Vadim Gimpelson 6c8ad3e
bug fix by Vadim Gimpelson 3405b55
[PERF] Сontinue indexes optimisations (#473) by Vadim Gimpelson da24ee3
[Bug] Resolved multi-threading conflict with save_lower_ir() (#480) by ZichuWu 6a116ad
Fixed the format change on the new transformers version (#482) by ZichuWu 0a81840
Fix masked attention by using fp32 accumulate on first matmul (q and k) part (#468) by zhumakhan 40c12c9
remove mpt-7b due to accuracy failure (#477) by zhumakhan 53a0cc4
[BUG] Support concat empty tensors (#475) by ZichuWu 85bb6dd
[TOOLS] Attached hash values to function signature in source.cu (#459) by ZichuWu a6f1033
[BUG] Fix ValueError caused by different operand data types in if_then_else while initializing Conv2dTransposeGemmImageTask (#470) by Bolin Sun 2826490
[BUG] Fix ZeroDivisionError triggered wihtin the function parallel_part_heuristic in graph/ops/conv2d/conv2d_gemm.py (#472) by Bolin Sun a11d69c
[BUG] Fixing memory issue encountered while compiling the model sam (#466) by Bolin Sun c695974
[PERF] Indexes optimization (#458) by Vadim Gimpelson f1ee08f
Added more llms to Regression test (#432) by zhumakhan 03d6250
Revert "[Dynamic][Enhancement] Convert div and mod including symbolvars to fast int div/mod" (#463) by Max Hu 2989389
[Dynamic][Enhancement] Convert div and mod including symbolvars to fast int div/mod (#405) by Max Hu 0cffe7e
[CI] Print stderr in run_tests.py (#443) by Vadim Gimpelson 015ffcd
[BUG] Fix NotImpelementedError encountered while compiling the model doctr_det_predictor (#462) by Bolin Sun 868dc9d
[Operators] Adding support for torch.nn.GLU module (#461) by Bolin Sun f756051
[BUG] Fixing another error encountered while compiling detectron2_fcos_r_50_fpn (#457) by Bolin Sun 798ce6e
[Ir][Primitives] fix #436 via adding missing instructions (#440) by xiaocenxiaocen 131ec20
[BUG] Fixing errors encountered while compiling detectron2_fcos_r_50_fpn (#455) by Bolin Sun c74732d
[PERF] Introduce the new IR optimization Pass that spatial(1,47) -> spatial(47) (#452) by Vadim Gimpelson 0f2990b
[Bug] Fixing the ValueError triggered while compiling the model dlrm during operator fusion pass (#437) by Bolin Sun de94946
[Scripts] Add scripts of our wheel server (#439) by Yaoyao Ding 628eb60
[Graph][Ops] disable cublas matmul for parallel k (#431) by xiaocenxiaocen 2696c34
[BUG] Fixing an error triggered from the conv_channel_last_pass while compiling the model sam (#444) by Bolin Sun ba45522
[BUG] Fixing a bug triggered while compiling in-place operator torch.Tensor.scatter_add_ (#429) by Bolin Sun 4f142c4
[PERF] Specialize pow(x,2) as x*x. llama-7B (#434) by Vadim Gimpelson f421a43
[Version] Update 0.4.0 -> 0.5.0.dev in setup.py (#433) by Vadim Gimpelson d9da46f
[PERF] Allow prologue fusion for reduce op (#426) by Vadim Gimpelson 6606477
[Bug] fixing regression (#422) by zhumakhan 646f7e7
[Utility] Add ncu and nsys test utilities (#413) by Yaoyao Ding 2fc304f
[Operators] Adding support for the method torch.Tensor.scatter_add_ (#421) by Bolin Sun 8568afb
[Fix] fixed torch.pow (#420) by zhumakhan cac4a0e
[Primitives] Add CUDA primitives: prmt, lop3, f16x2 sub and fma, and barrier (#414) by Yaoyao Ding 5186d87
[Ir][Primitives] add exp2 (#410) by xiaocenxiaocen bbbfb7b
[Update] Updating torch docker image from 24.04 to 24.07 (#418) by zhumakhan 9899060
[Fix] Support writing subbyte data to global memory (#415) by **Yao...

Contributors

maxyanghu, xiaocenxiaocen, and 8 other contributors

Assets 2

30 Jul 02:40

vadiklyutiy

v0.4.1

a412bdf

Hidet v0.4.1

What's Changed

[Fix] Fixing an error triggered by the operator any (#369) by Bolin Sun 6a4c2e5
[Fix] added torch.t for mobilebert-uncased model (#353) by zhumakhan 95d95a4
[CI] Use same image for tests and publishing test execution (#463) by c-fteixeira 49fd332
[BUG] fix bug in disallow in graph (#464) by Vadim Gimpelson d84f2c5
[CI] Move Publish workflow to internal ARC runners (#461) by c-fteixeira b5d6aaf
[CI] Update container for CI (#460) by Vadim Gimpelson b973591
[Bug] Rename test_arithmetic.py -> test_arithmetic2.py (#459) by Vadim Gimpelson 6aa6cf8
Update requirements-dev.txt to use pytorch version >= 2.3.0 (#458) by Vadim Gimpelson 6b32295
[CI] Repeat start_instance (#361) by vadiklyutiy cf5cadd
[Operators] Adding leaky_relu support (#360) by Bolin Sun 7401ccc
[Fix] Fixing an error triggered while compiling the torch.nn.Upsample module with align_corners=True (#344) by Bolin Sun 2c34cfc
[PERF] Remote workaround for loops in add_hints_pass (#356) by vadiklyutiy 3195be5
[Operators] Registering tensor methods whose PyTorch function equivalents are supported by Hidet (#347) by Bolin Sun 44ab5ad
[PERF] Introduce add_hint_pass (#355) by vadiklyutiy c014dab
[CI] Promote nvidia docker container to version 24.4 (#354) by vadiklyutiy cb809b9
[Fix] type casting for attention mask from fp32 -> f16 (#323) by zhumakhan 9a10dc0
[Fix] Added missing torch.multiply and torch.nn.functional.unfold ops for conv-bert-base model (#351) by zhumakhan 18842ee
[Fix] Fixing a bug in register_methods (#331) by Bolin Sun c87c515
[Fix] Handling special cases in setitem regarding dtype and device (#332) by Bolin Sun ff9445e
[BUG] Fixed search_space bug in bench_op.py (#348) by vadiklyutiy 29e4c0e
[OPS] Dissallow in fxgraph not supported functions (#317) by vadiklyutiy 984cf75
[OPTIONS] Remove dynamo_config['search_space'] (#342) by vadiklyutiy 0814bd8
[Operator] Adding support for torch.Tensor.view_as (#334) by Bolin Sun 5f19dd0
[Operators] Adding support for torch.nn.TransformerEncoder (#327) by Bolin Sun d625146
[OPTIONS] Inherit options from torch.compile() (#260) by vadiklyutiy 3638a0b
[Operator] Adding __ge__ method for the Tensor class (#330) by Bolin Sun ed5feff
[Fix] Fixing an error triggered by ClampOp (#329) by Bolin Sun 05984cb
[Fix] Handling hidet errors caused by device difference in getitem (#322) by Bolin Sun 5a90820
[Fix] Fixing a RuntimeError triggered by tensor_reshape function in register_functions.py (#328) by Bolin Sun 0cd2f83
[Operators] Adding PyTorch operators encountered while compiling DALLE2_pytorch (#319) by Bolin Sun ecb99b1
[Fix] Fix the bug in tensor_expand caused by attempting to modify immutable_list (#320) by Bolin Sun bb89e22
[Chore] replace copyrights with citations (#315) by xiaocenxiaocen 3fba091
[Operator] Extending the functionality support for einsum (#312) by Bolin Sun 703e92a
Handle dtype and device in hidet.ones_like op (#316) by zhumakhan f031eb3
[PERF] Reduce fixed overhead for model run (#310) by vadiklyutiy fadf67d
Increase batch size for bert to decrease fluctuations (#236) by vadiklyutiy a8db40c
Setitem with tensor values. And Boolean type promotion (#290) by zhumakhan 60e75ca
[BUG] when device is None, device_from_torch returns 'cpu' by default. Fixed (#311) by zhumakhan d047440
[Graph][Ops] fp32 accumulation for cute matmul (#292) by xiaocenxiaocen a813605
[Perf] support vectorized epilogue fusion (#220) by xiaocenxiaocen ddacf36
Removing constant tensors that are not needed after subgraph rewrite pass (#252) by zhumakhan db49f68
[Fix] Handling Tensor.to(..., device=....) on symbolic tensors (#284) by Bolin Sun 6357880
[Operator] torch.any (#287) by zhumakhan 8a42a65
[Graph][Ops] fp32 accumulation for matmul_f16 (#268) by xiaocenxiaocen 5bf255a
adding support for torch.any (#277) by zhumakhan 2c4c672
fix: handles race condition on parallel config directory creation (#285) by c-fteixeira b465dd3
[SCRIPTS] Adopt our scripts to use mode from torch.compile (#274) by vadiklyutiy 0f825b3
[Fix] Handling getitem special case (#281) by Bolin Sun 564561e
[Operator] Added advanced tensor indexing (#251) by zhumakhan 018ca2c
[Operator] Adding support to repeat_interleave and more (#270) by Bolin Sun b52bc88
[PERF] Increase accuracy of pick up the best candidate (#269) by vadiklyutiy 3834643
[Operator] Registering torch.Tensor.copy_ (#259) by Bolin Sun af5c893
[OPTIONS] Use Attention by default (#261) by vadiklyutiy 33ad85b
[Operator] Registering torch.sigmoid_ (#258) by Bolin Sun c9fb801
[Operator] Adding support for torch.Tensor.div (#249) by Bolin Sun c8d4663
[Operator] Adding torch.Tensor.expand_as support (#250) by Bolin Sun 923f078
[Operator] Adding support to operators torch.Tensor.max and torch.Tensor.new_full (#238) by Bolin Sun c5912a4
Delete options use_fp16 and use_fp16_reduction (#239) by vadiklyutiy e7fe23b
Inherit mode argument from torch.compile and set corresponding options (#237) by vadiklyutiy 91f666e
[Operators] Registering torch.as_tensor (#235) by Bolin Sun 540367b
[Operator] Registering torch.Tensor.argmax (#234) by Bolin Sun bdd7acd
[Ir][CuTE] lower cute dialect (#109) (#230) by xiaocenxiaocen 783a549
Xiaocenxiaocen/expose more ldst instructions (#216) by xiaocenxiaocen 8f03f9e
steal_weight option fixes && fixes for mistral model (#209) by zhumakhan 9728c21
Fix issues related to mistral model (#213) by zhumakhan 68e801b
[BENCHs] Refactor transformers tests. Add llama2, mistral, gemma, gpt2 to script (#210) by vadiklyutiy 59028d8
[BUGFIX] Init cuda info before run forks for IR generation (#208) by vadiklyutiy 3012546
[Ir] add utilities for CuTe (#107) by xiaocenxiaocen 423e112
[BUG] Clear _job_queue in parallel_imap for tests (#204) by vadiklyutiy bf39bd6
[OPTIONS] Don't create hidet config if it's not exist (#203) by vadiklyutiy 294d261
feat: parallel job execution for tests (#147) by c-fteixeira db588f9
__getitem__ with N dimensional index tensor (#185) by zhumakhan f46a184
[Fix] Remove YOLOv7 from tests/benchmarks/run_configs.json (#187) by Bolin Sun 5fc4271
[Operator] Adding meshgrid operator support (#183) by Bolin Sun d8158a9
[Bug] Fix number of groups under certain case (#181) by Max Hu 8a6cbfd
[COMPTIME] Reduce the number of fork in multithreading.Pool (#180) by vadiklyutiy 9e576dc
[COMPTIME] Add chunksize arg to pool.imap (#178) by vadiklyutiy 7c50af6
optimize grouping method (#174) by Max Hu 9b9a22b
[App] SyncLLM + AsyncLLM interface (#166) by Jack Lee e51f0c0
[Ir][Primitives] add hopper instructions (#83) by xiaocenxiaocen 4225298
[OPS] Add torch.Tensor.sin, torch.Tensor.cos and torch._C._nn.pad (#175) by vadiklyutiy 90a6231
[App] ResNet Compiled App (2/2) - Pipeline (#165) by Kevin Tong d308f8f
Revive dynamic shape support with torch.compile (#162) by vadiklyutiy cf343ab
[Models] Gemma implementation (#132) by Jack Lee 3a84820
Support Transpose2D (#77) by zhiwei-fang dd2e9d2
[App] Cleanup SD Implementation (#143) by Kevin Tong 359763e
[Fixbug] Set _is_exiting correctly (#163) by Jack Lee 1c8b31f
[App] Fix LLM app tracing (#158) by Jack Lee f618977
[Operator] triu + tril operators (#146) by Jack Lee 70894fa
Gemma+torch.compile fixes(autocast, rtruediv) (#159) by vadiklyutiy 710ac50
[IR] [Primitives] Add thread cluster on sm_90 (#145) by Kevin Tong ccc28d6
[App] Minor bugfixes for LLM app (#157) by Jack Lee 179f058
[COMPTIME] Specialize Constant._binary() for compilation speedup (#148) by vadiklyutiy 8a1eab4
[Operator] Fix symbolic broadcasting (#131) by Jack Lee 1252220
[Operator] Register missing math primitives (#134) by Jack Lee 61b0052
[Ir][Primitives] fix __shfl_xor_sync (#155) by xiaocenxiaocen 37c75a6
[COMPTIME] Parallelize apply_prologue_epilog(fusion) and IR generation(implement*) (#127) by vadiklyutiy 9e96c45
[Graph] Enhance forward debug instrument (#130) by Jack Lee 4267686
Stable Diffusion App Infra (#103) by Kevin Tong 8f03f9e
[LLM App] LLM Application initial support (#121) by Yaoyao Ding fc61f48
[Models] Support for tokenizers in C++ runtime (#69) by Jack Lee c14de4e
[Graph] Add major UNet building components (#97) by Kevin Tong 364ba9c
[CI] Add clang-format script/action (#120) by Jack Lee cdff99a
[Graph] Stable Diffusion Rope Module (#95) by Kevin Tong 6fa5803
[App] Complete UNet Definition (#99) by Kevin Tong 805620e
[FFI] Refactor CompiledFunction interface with ctypes (#79) by Jack Lee a8c9d94
[STYLE] Format cpp/h files (#454) by vadiklyutiy 1f1b011
[cuDNN] Add cudnn conv2d (#453) by vadiklyutiy bc5a6df

Contributors

Full Changelog: v0.3.1...v0.4.0

Contributors

maxyanghu, xiaocenxiaocen, and 8 other contributors

Assets 2

28 Jul 09:50

vadiklyutiy

v0.4.0

fb10830

Hidet v0.4.0

What's Changed

[Fix] Fixing an error triggered by the operator any (#369) by Bolin Sun 6a4c2e5
[Fix] added torch.t for mobilebert-uncased model (#353) by zhumakhan 95d95a4
[CI] Use same image for tests and publishing test execution (#463) by c-fteixeira 49fd332
[BUG] fix bug in disallow in graph (#464) by Vadim Gimpelson d84f2c5
[CI] Move Publish workflow to internal ARC runners (#461) by c-fteixeira b5d6aaf
[CI] Update container for CI (#460) by Vadim Gimpelson b973591
[Bug] Rename test_arithmetic.py -> test_arithmetic2.py (#459) by Vadim Gimpelson 6aa6cf8
Update requirements-dev.txt to use pytorch version >= 2.3.0 (#458) by Vadim Gimpelson 6b32295
[CI] Repeat start_instance (#361) by vadiklyutiy cf5cadd
[Operators] Adding leaky_relu support (#360) by Bolin Sun 7401ccc
[Fix] Fixing an error triggered while compiling the torch.nn.Upsample module with align_corners=True (#344) by Bolin Sun 2c34cfc
[PERF] Remote workaround for loops in add_hints_pass (#356) by vadiklyutiy 3195be5
[Operators] Registering tensor methods whose PyTorch function equivalents are supported by Hidet (#347) by Bolin Sun 44ab5ad
[PERF] Introduce add_hint_pass (#355) by vadiklyutiy c014dab
[CI] Promote nvidia docker container to version 24.4 (#354) by vadiklyutiy cb809b9
[Fix] type casting for attention mask from fp32 -> f16 (#323) by zhumakhan 9a10dc0
[Fix] Added missing torch.multiply and torch.nn.functional.unfold ops for conv-bert-base model (#351) by zhumakhan 18842ee
[Fix] Fixing a bug in register_methods (#331) by Bolin Sun c87c515
[Fix] Handling special cases in setitem regarding dtype and device (#332) by Bolin Sun ff9445e
[BUG] Fixed search_space bug in bench_op.py (#348) by vadiklyutiy 29e4c0e
[OPS] Dissallow in fxgraph not supported functions (#317) by vadiklyutiy 984cf75
[OPTIONS] Remove dynamo_config['search_space'] (#342) by vadiklyutiy 0814bd8
[Operator] Adding support for torch.Tensor.view_as (#334) by Bolin Sun 5f19dd0
[Operators] Adding support for torch.nn.TransformerEncoder (#327) by Bolin Sun d625146
[OPTIONS] Inherit options from torch.compile() (#260) by vadiklyutiy 3638a0b
[Operator] Adding __ge__ method for the Tensor class (#330) by Bolin Sun ed5feff
[Fix] Fixing an error triggered by ClampOp (#329) by Bolin Sun 05984cb
[Fix] Handling hidet errors caused by device difference in getitem (#322) by Bolin Sun 5a90820
[Fix] Fixing a RuntimeError triggered by tensor_reshape function in register_functions.py (#328) by Bolin Sun 0cd2f83
[Operators] Adding PyTorch operators encountered while compiling DALLE2_pytorch (#319) by Bolin Sun ecb99b1
[Fix] Fix the bug in tensor_expand caused by attempting to modify immutable_list (#320) by Bolin Sun bb89e22
[Chore] replace copyrights with citations (#315) by xiaocenxiaocen 3fba091
[Operator] Extending the functionality support for einsum (#312) by Bolin Sun 703e92a
Handle dtype and device in hidet.ones_like op (#316) by zhumakhan f031eb3
[PERF] Reduce fixed overhead for model run (#310) by vadiklyutiy fadf67d
Increase batch size for bert to decrease fluctuations (#236) by vadiklyutiy a8db40c
Setitem with tensor values. And Boolean type promotion (#290) by zhumakhan 60e75ca
[BUG] when device is None, device_from_torch returns 'cpu' by default. Fixed (#311) by zhumakhan d047440
[Graph][Ops] fp32 accumulation for cute matmul (#292) by xiaocenxiaocen a813605
[Perf] support vectorized epilogue fusion (#220) by xiaocenxiaocen ddacf36
Removing constant tensors that are not needed after subgraph rewrite pass (#252) by zhumakhan db49f68
[Fix] Handling Tensor.to(..., device=....) on symbolic tensors (#284) by Bolin Sun 6357880
[Operator] torch.any (#287) by zhumakhan 8a42a65
[Graph][Ops] fp32 accumulation for matmul_f16 (#268) by xiaocenxiaocen 5bf255a
adding support for torch.any (#277) by zhumakhan 2c4c672
fix: handles race condition on parallel config directory creation (#285) by c-fteixeira b465dd3
[SCRIPTS] Adopt our scripts to use mode from torch.compile (#274) by vadiklyutiy 0f825b3
[Fix] Handling getitem special case (#281) by Bolin Sun 564561e
[Operator] Added advanced tensor indexing (#251) by zhumakhan 018ca2c
[Operator] Adding support to repeat_interleave and more (#270) by Bolin Sun b52bc88
[PERF] Increase accuracy of pick up the best candidate (#269) by vadiklyutiy 3834643
[Operator] Registering torch.Tensor.copy_ (#259) by Bolin Sun af5c893
[OPTIONS] Use Attention by default (#261) by vadiklyutiy 33ad85b
[Operator] Registering torch.sigmoid_ (#258) by Bolin Sun c9fb801
[Operator] Adding support for torch.Tensor.div (#249) by Bolin Sun c8d4663
[Operator] Adding torch.Tensor.expand_as support (#250) by Bolin Sun 923f078
[Operator] Adding support to operators torch.Tensor.max and torch.Tensor.new_full (#238) by Bolin Sun c5912a4
Delete options use_fp16 and use_fp16_reduction (#239) by vadiklyutiy e7fe23b
Inherit mode argument from torch.compile and set corresponding options (#237) by vadiklyutiy 91f666e
[Operators] Registering torch.as_tensor (#235) by Bolin Sun 540367b
[Operator] Registering torch.Tensor.argmax (#234) by Bolin Sun bdd7acd
[Ir][CuTE] lower cute dialect (#109) (#230) by xiaocenxiaocen 783a549
Xiaocenxiaocen/expose more ldst instructions (#216) by xiaocenxiaocen 8f03f9e
steal_weight option fixes && fixes for mistral model (#209) by zhumakhan 9728c21
Fix issues related to mistral model (#213) by zhumakhan 68e801b
[BENCHs] Refactor transformers tests. Add llama2, mistral, gemma, gpt2 to script (#210) by vadiklyutiy 59028d8
[BUGFIX] Init cuda info before run forks for IR generation (#208) by vadiklyutiy 3012546
[Ir] add utilities for CuTe (#107) by xiaocenxiaocen 423e112
[BUG] Clear _job_queue in parallel_imap for tests (#204) by vadiklyutiy bf39bd6
[OPTIONS] Don't create hidet config if it's not exist (#203) by vadiklyutiy 294d261
feat: parallel job execution for tests (#147) by c-fteixeira db588f9
__getitem__ with N dimensional index tensor (#185) by zhumakhan f46a184
[Fix] Remove YOLOv7 from tests/benchmarks/run_configs.json (#187) by Bolin Sun 5fc4271
[Operator] Adding meshgrid operator support (#183) by Bolin Sun d8158a9
[Bug] Fix number of groups under certain case (#181) by Max Hu 8a6cbfd
[COMPTIME] Reduce the number of fork in multithreading.Pool (#180) by vadiklyutiy 9e576dc
[COMPTIME] Add chunksize arg to pool.imap (#178) by vadiklyutiy 7c50af6
optimize grouping method (#174) by Max Hu 9b9a22b
[App] SyncLLM + AsyncLLM interface (#166) by Jack Lee e51f0c0
[Ir][Primitives] add hopper instructions (#83) by xiaocenxiaocen 4225298
[OPS] Add torch.Tensor.sin, torch.Tensor.cos and torch._C._nn.pad (#175) by vadiklyutiy 90a6231
[App] ResNet Compiled App (2/2) - Pipeline (#165) by Kevin Tong d308f8f
Revive dynamic shape support with torch.compile (#162) by vadiklyutiy cf343ab
[Models] Gemma implementation (#132) by Jack Lee 3a84820
Support Transpose2D (#77) by zhiwei-fang dd2e9d2
[App] Cleanup SD Implementation (#143) by Kevin Tong 359763e
[Fixbug] Set _is_exiting correctly (#163) by Jack Lee 1c8b31f
[App] Fix LLM app tracing (#158) by Jack Lee f618977
[Operator] triu + tril operators (#146) by Jack Lee 70894fa
Gemma+torch.compile fixes(autocast, rtruediv) (#159) by vadiklyutiy 710ac50
[IR] [Primitives] Add thread cluster on sm_90 (#145) by Kevin Tong ccc28d6
[App] Minor bugfixes for LLM app (#157) by Jack Lee 179f058
[COMPTIME] Specialize Constant._binary() for compilation speedup (#148) by vadiklyutiy 8a1eab4
[Operator] Fix symbolic broadcasting (#131) by Jack Lee 1252220
[Operator] Register missing math primitives (#134) by Jack Lee 61b0052
[Ir][Primitives] fix __shfl_xor_sync (#155) by xiaocenxiaocen 37c75a6
[COMPTIME] Parallelize apply_prologue_epilog(fusion) and IR generation(implement*) (#127) by vadiklyutiy 9e96c45
[Graph] Enhance forward debug instrument (#130) by Jack Lee 4267686
Stable Diffusion App Infra (#103) by Kevin Tong 8f03f9e
[LLM App] LLM Application initial support (#121) by Yaoyao Ding fc61f48
[Models] Support for tokenizers in C++ runtime (#69) by Jack Lee c14de4e
[Graph] Add major UNet building components (#97) by Kevin Tong 364ba9c
[CI] Add clang-format script/action (#120) by Jack Lee cdff99a
[Graph] Stable Diffusion Rope Module (#95) by Kevin Tong 6fa5803
[App] Complete UNet Definition (#99) by Kevin Tong 805620e
[FFI] Refactor CompiledFunction interface with ctypes (#79) by Jack Lee a8c9d94
[STYLE] Format cpp/h files (#454) by vadiklyutiy 1f1b011
[cuDNN] Add cudnn conv2d (#453) by vadiklyutiy bc5a6df

Contributors

Full Changelog: v0.3.1...v0.4.0

Contributors

maxyanghu, xiaocenxiaocen, and 8 other contributors

Assets 2

03 Apr 15:21

yaoyaoding

v0.3.1

33d8bdd

Hidet v0.3.1

What's Changed

[Version] Bump version to v0.3.1.dev by @yaoyaoding in #361
[Option] Add an option to disable imperative execution by @serach24 in #362
[Graph][Benchmark] Update benchmark function by @Aalanli in #363
[Compile Server] Update deps for compilation server by @xinli-git in #365
[Utils] Changed the multiprocessing context by @destefy in #367
[Dynamo] Refactoring code for Hidet remote compilation by @destefy in #369
[Graph][Dynamo Backend] Lshift/Rshift/Mod by @Aalanli in #371
[Graph][Operator] Fix reduce bug, add uint8x4 by @Aalanli in #372
[CompiledGraph] Add option to store dispatch table option by @destefy in #377
[Graph][Tensor] remove unnecessary synchronization by @xiaocenxiaocen in #374
[Graph][Dynamo Backend] Minor imperative run bug fix by @Aalanli in #383
[Graph] Fix CompiledGraph aliasing bug by @Aalanli in #384
[Frontend] Add mapping for torch.sqrt by @yaoyaoding in #387
[Fix][Graph] Write compiled graph to tempfile first by @destefy in #392
[Operators] Improving fp32 matrix multiplication on x86 CPUs by @BolinSNLHM in #378
[Fixbug] Fix a bug related to c/c++ integer promotion by @yaoyaoding in #391
[Option] Add option to set class Var id attribute to 0 by default by @destefy in #393
[CI] Add CI workflow and scripts by @hjjq in #394
[CI] Fix deadlock by @hjjq in #395
[Operator] Enhancements to Reduce by @hjjq in #366
[CI] Launch and stop compile server via workflow by @hjjq in #396
[Operator] Support advanced options for pooling operators by @yaoyaoding in #399
[Torch] Implements torch_func protocol by @yaoyaoding in #400
[Docs] Add more documentation by @yaoyaoding in #401
[Fixbug] Fix a performance bug in auto-scheduler by @yaoyaoding in #402
[Library] Add cublas library by @yaoyaoding in #404
[Operator] Add hidet.ops.matmul_cublas operator by @yaoyaoding in #405
[Fusion] Allow shallow fusion of cublas operator by @yaoyaoding in #407
[CI] Clear op cache by @hjjq in #406
[Runtime] Add a new compiled format CompiledApp by @yaoyaoding in #408
CPU AVX implementation for Softmax, Norm by @fishingguy456 in #357
[CI] Reduce scope of secrets by @hjjq in #413
[Operator] Add a opaque operator base class by @yaoyaoding in #414
[IR] Support inplace operators by @yaoyaoding in #416
[Graph][Quantization] Multi-stage software pipelining and update parallel k rule by @Aalanli in #364
[CI] Trigger workflow by @hjjq in #417
[Scheduler] Add the fused task name to auto-scheduled kernels by @yaoyaoding in #418
[CI] Use cudagraph for benchmarks by @hjjq in #419
[CI] Remove unnecessary synchronization by @hjjq in #420
Update Netron viewer link by @KTong821 in #421
[Operator] Add cublas to matmul tune space by @hjjq in #422
[IR] Support integer subbyte by @xiaocenxiaocen in #403
[README] Fix ONNX link by @dbabokin in #425
[cuBLAS] Add cublas_gemm_batched and use cublasSetStream to set stream to the current stream in all cublas API calls by @yudi0201 in #423
[Fixbug] Fix dynamic memcpy bug by @KTong821 in #427
[Compile Server] Fetch repo before checking out by @hjjq in #429
[CI] Use slurm for runners by @hjjq in #430
[CI] CI migration by @hjjq in #433
[Fixbug] Fix graph metadata hash by @KTong821 in #428
[CI] Add back tests by @hjjq in #436
[Fix] Skip a failed test due to huggingface transformers update by @yaoyaoding in #439
[RC] Release candidate for version 0.3.1 by @yaoyaoding in #442

New Contributors

@destefy made their first contribution in #367
@xiaocenxiaocen made their first contribution in #374
@fishingguy456 made their first contribution in #357
@KTong821 made their first contribution in #421
@dbabokin made their first contribution in #425
@yudi0201 made their first contribution in #423

Full Changelog: v0.3.0...v0.3.1

Contributors

dbabokin, xiaocenxiaocen, and 10 other contributors

Assets 2

28 Sep 15:53

yaoyaoding

v0.3.0

ea32c5c

Hidet v0.3.0

Notes

In this release, we add more support for large language model inference, distributed inference, and quantization. We also make hidet script more stable and added more documentation for it. More operators and models are supported. See below for more details.

Frontend

[Frontend] Dynamic shape fx trace by @Aalanli in #294
[Torch] Steal Pytorch weights by @hjjq in #310
[Dynamo Frontend] Refactor the dynamic shape support by @yaoyaoding in #319
[Torch][Graph][Operator] Add and fix various items for torchvision model support by @hjjq in #347
[Dynamo] minor enhancements to attention and register a few functions by @xinli-git in #345

Operators and models

[Operator] Further performance enhancements for conv2D by @Aalanli in #290
[Operator] Refactoring matrix multiplication implementation by @yaoyaoding in #296
[Model Support] Add support for wav2vec by @yaoyaoding in #303
[Operator] Update attention for dynamic shape by @hjjq in #307
[Operator] Resolve Adaptive Pool to reduce by @hjjq in #308
[Reduce] optimize and unify reduce operator to a single place by @xinli-git in #311
[Operator] optimize normalize op with vectorized load, dynamic shape and more by @xinli-git in #316
[Model] Add missing operators for T5 by @yaoyaoding in #322
[Fixbug] Reduce should perform syncthread after initializing shared memory to zero by @xinli-git in #325
[Models] Llama 2 support by @Aalanli in #324
[Models] Llama2 fix by @Aalanli in #333
[Operator] Composite Elementwise Operation by @hjjq in #337
[Operator] Add clamp/isinf/any/all op, enhance where op by @yaoyaoding in #343
[Torch][Operator] More torchvision model support by @hjjq in #348
[Operator] Add einsum by @hjjq in #349
[Operator][Graph][Regression] CNN optimizations by @hjjq in #356
[Graph] Minor bug fixes by @hjjq in #358

Distributed inference

[Distributed] all_reduce op and distributed info in graphs by @soodoshll in #284
[Distributed] Add more runtime distributed communication functions by @soodoshll in #314
[Fixbug] group_start and group_end should be able importable without nccl by @soodoshll in #317

Quantization

[Operators] preliminary symmetric weight quantization by @Aalanli in #298
[Quantization] Quantization API by @Aalanli in #309
[Quantization] fix quantization pass bug by @Aalanli in #355

IR and passes

[FixBug] Don't instantiate symbol for primitive functions by @hjjq in #291
[Fix] NCCL API mismatch and NCCL primitive fix by @soodoshll in #301
[Fixbug] Prevent allreduce op from being fused by @soodoshll in #304
[Enhancements] add a vcude device to help mitigate compile time GPU memory usage by @xinli-git in #302
[Task] More descriptive kernel names for nsys/ncu by @Aalanli in #315
[Fixbug][Hidet Script] Fix a bug that hidet script does not recognize return type by @yaoyaoding in #329
[Hidet script] Add hidet.lang.types submodule by @yaoyaoding in #340
[IR][Parser] Hidet IR grammar, parser and ir reconstructor by @Aalanli in #354

Runtime

[Runtime] Check for input tensor device by @hjjq in #287
[Fixbug] Is exiting fix by @xinli-git in #293

Backends

[Fixbug] Fix the c++ standard to c++11 for both nvcc and gcc compilers by @yaoyaoding in #327
[CPU][Scheduler] Use mutli-threads for autl-scheduler by @yaoyaoding in #341

Documentation

[Document] fix installation guide by @soodoshll in #288
[Docs] Update the documentation for the coming release by @yaoyaoding in #360

Others

[Version] Bump version to 0.3.0.dev by @yaoyaoding in #286
[Tools] simple benchmarking utility by @Aalanli in #292
[Compile Server] Support remote compilation via compilation server by @yaoyaoding in #297
[Compile Server] Allow the user to specify the repo and branch/tag to use by @yaoyaoding in #300
[Compile Server] Add a new option to specify the cuda arch by @yaoyaoding in #305
[Fixbug] Fix a bug in compile server by @yaoyaoding in #306
[Graph] Minor graph benchmark fix by @Aalanli in #313
[Regression] Local performance regression by @hjjq in #321
[Regression] Increase benchmark iters and update perf data by @hjjq in #328
[CI] List package versions in ci by @yaoyaoding in #334
[Fixbug] Clear the intermediate object files for kernel tuning by @yaoyaoding in #339
[Config] Add configuration file by @Aalanli in #359

Full Changelog: v0.2.4...v0.3.0

Contributors

soodoshll, yaoyaoding, and 3 other contributors

Assets 2

21 Jun 02:00

yaoyaoding

v0.2.4

289377a

Hidet v0.2.4

What's Changed

[Version] Bump version to v0.2.4.dev by @yaoyaoding in #188
[Dynamo] module tests + operator support by @AndreSlavescu in #148
Refactor compilation workflow to support CPU without CUDA by @LDY1998 in #189
[Stack] Allow the the ulimit stack size less than expected by @yaoyaoding in #195
[Readme] Add platform requirements by @yaoyaoding in #196
[DataType] Add complex64 and complex128 data type by @yaoyaoding in #200
[Example] Add an example of running GPT-2 model by @yaoyaoding in #203
[Fusion] Use inline pass in fusion to allow template call functions with kernel params by @yaoyaoding in #197
[Frontend][Operator] Add missing operators for dinov2 by @yaoyaoding in #206
[Backend] Add openmp support by @yaoyaoding in #208
[Operator] Update batch_matmul to use Hidet Script by @hjjq in #207
[Cache] Add cache management command line interface by @yaoyaoding in #212
[IR] Creation-time constant fold for constant expressions by @yaoyaoding in #209
[Torch][Operator] Allow change torch tensor device when possible by @yaoyaoding in #214
[Torch][Operator] Add op mapping for torch.min/max/minimum/maximum by @yaoyaoding in #216
[Typo] Fix a typo in resnext.py by @eltociear in #210
[Operator] Adding missing operators for llama by @yaoyaoding in #219
[IR] Adding more support for dynamic shape on Task and FlowGraph level by @yaoyaoding in #220
[Torch] Add mapping for torch.ops.aten.add and torch.ops.aten.cos by @yaoyaoding in #223
[Operator][Backend] Add nvcc flags for faster math and update Attention schedule by @hjjq in #221
[CI] Always clear the cache before tests by @yaoyaoding in #224
fix batch_matmul for invalid mma config for sm < 80 by @xinli-git in #227
[Dynamic Shape] Adding more dynamic shape support by @yaoyaoding in #228
[CI] Add importlib_metadata to requirements-dev.txt by @yaoyaoding in #233
[Script] Add list comprehension support in hidet script by @yaoyaoding in #235
[Refactor][Dynamic Shape] Introduce SymbolVar to implement dynamic shape by @yaoyaoding in #236
[Script] Add pointer arthematic by @yaoyaoding in #237
[Operator][Torch] Add causal fmha and torch sdpa mapping by @hjjq in #238
[Fixbug][Pass] Fix a bug in the inline_let_stmt pass by @yaoyaoding in #240
[Options] Add option for controlling parallel build with number of jobs or memory reserved for each job by @xinli-git in #230
[Typo] Fix a typo by @BolinSNLHM in #245
[Typo] Fix minor spelling mistake by @Aalanli in #246
[Fixbug] Fix a bug in StmtRewriter which discard declare scope information by @yaoyaoding in #248
[Refactor] Adding support for compiled model by @yaoyaoding in #247
[Operator] batch_matmul: Remove duplicate smem declaration by @hjjq in #249
[Operator] Adding CPU support for matrix multiplication by @BolinSNLHM in #251
[Hidet Script] Allow bind_tuple argument in mapping.on(...) and grid(...) by @yaoyaoding in #254
[Hidet Script] Add in and not in expression in hidet script by @yaoyaoding in #255
[Codegen] Include header files as needed by @yaoyaoding in #256
[Operator] Add new operator "normalize" that makes a group of layers (layer norm, group norm and instance norm) faster using hidet script by @xinli-git in #257
[Testing][Models] Add gpt2 module in testing models by @yaoyaoding in #252
[Fixbug] Fix test warnings and the incompatibility of two recent PRs by @yaoyaoding in #258
[Operator] Add sm75 support for attention by @hjjq in #259
[Operator] batch_matmul: Remove unroll and reduce tuning space by @hjjq in #260
[Fixbug] Fix a bug when fused operator has no input by @yaoyaoding in #263
[Graph] Translate softmax and reduce to hidet script by @Aalanli in #242
[Fixbug] batch_matmul: move cc checking inside schedule by @hjjq in #264
[Refactor] Refactor building system and adding compiled products by @yaoyaoding in #261
[Fixbug] Reduce the default unroll factor to 4 by @yaoyaoding in #266
[Torch] Add some torch frontend mappings for roberta-base by @hjjq in #267
[Refactor] Remove schedules submodule under hidet.graph.ops by @yaoyaoding in #269
[Device] Add support for mixed cpu and cuda kernels in the same flow graph by @yaoyaoding in #270
[Dynamic Shape] Adding dynamic shape support for reduce by @Aalanli in #268
[Complex Type] Add more support for complex data type by @yaoyaoding in #271
[Tools] Model translator by @Aalanli in #273
[Model] Llama model implementation in hidet by @Aalanli in #243
[Operator] Add support for cross attention by @hjjq in #275
[Operator] Add dynamic shape support and tests for Operators. by @Aalanli in #274
[Fusion] Enhance the prologue epilogue fusion by @yaoyaoding in #277
[Drivers] Suppress OSError by @hjjq in #278
[Dynamic Shape] More correctness guards by @Aalanli in #276
[Operator] Make Convolution gemms fusible by resolving to batch_matmul by @hjjq in #279
[External Tasks] Move task build into method call for external kernel support by @xinli-git in #282
[Distributed] add nccl primitives by @soodoshll in #280
[Operators] Conv2d fp16 implicit gemm kernel by @Aalanli in #283

New Contributors

@eltociear made their first contribution in #210
@BolinSNLHM made their first contribution in #245
@Aalanli made their first contribution in #246

Full Changelog: v0.2.3...v0.2.4

Contributors

soodoshll, eltociear, and 7 other contributors

Assets 2

24 Apr 20:31

yaoyaoding

v0.2.3

9a65fa2

Hidet v0.2.3

What's Changed

[Version] Bump version to v0.2.3.dev by @yaoyaoding in #144
[Workflow] Update workflow to use the stable version of pytorch by @yaoyaoding in #145
[Operator] Resolve matmul to batch_matmul when lower than sm80 by @hjjq in #146
[Dynamo] non-linear operator support + tests by @AndreSlavescu in #143
Remove tutorial msg by @LDY1998 in #149
[BUG] Conversion compile issue by @xinli-git in #150
[Dynamo] Fix dynamo tests and dump graph IR by @xinli-git in #153
[CI] Benchmark periodically by @yaoyaoding in #155
[CI] Update bench script by @yaoyaoding in #156
[CI] Add more env information to benchmark script by @yaoyaoding in #157
[CI] Remove benchmark workflow, but run it in dedicated server by @yaoyaoding in #159
[CI] Update benchmark script by @yaoyaoding in #160
[CI] Change the search space in benchmark script from 0 to 2 by @yaoyaoding in #161
[CI] Update benchmark script by @yaoyaoding in #162
[CI] Update benchmark scripts by @yaoyaoding in #163
[IR][Pass] Refactor the fusion implementation by @yaoyaoding in #164
[Dynamo] Add operator support to run UNet2DConditionModel from diffusers by @xinli-git in #151
[IR][Dynamic Shape] Enhance the Tensor Program IR to support dynamic shape by @yaoyaoding in #165
[Operator] Allow matmul_f16 fuse epilogue by @yaoyaoding in #167
[CI] Update benchmark script by @yaoyaoding in #168
[CUDA] Lazy initializing cuda context by @yaoyaoding in #169
[Fixbug] Allow one backend fail in benchmark script by @yaoyaoding in #170
[Fixbug] Use auto-scheduler for fp64 reduction by @yaoyaoding in #171
[Operator] Add gather operator and torch.zeros, torch.neg mapping by @yaoyaoding in #174
[CI] Update benchmark script by @yaoyaoding in #179
[Fixbug] Add _stacklevel to pytorch softmax mapping by @yaoyaoding in #178
[IR] Add unroll pragma for loop statement by @yaoyaoding in #180
[Operator] Flash Attention by @hjjq in #175
[Fixbug] Fix a bug in the mapping from device to its memory pool by @yaoyaoding in #181
[Dynamo] Small enchancements for graph dump ir and task arguments by @xinli-git in #172
[Docs] Update install instruction by @hjjq in #182
change norm to use smaller inputs to reduce running time by @xinli-git in #185
[IR] Add explicit unroll by @yaoyaoding in #184
[Runtime] Allow pass torch tensor to PackedFunc directly by @yaoyaoding in #183
Refactor codegen to separate GPU/CPU code generation by @LDY1998 in #176
[Pass] Support inline function by @yaoyaoding in #186

New Contributors

@LDY1998 made their first contribution in #149

Full Changelog: v0.2.2...v0.2.3

Contributors

yaoyaoding, xinli-git, and 3 other contributors

Assets 2

24 Mar 00:51

yaoyaoding

v0.2.2

3f15236

Hidet v0.2.2

What's Changed

[Version] Bump version to 0.2.2.dev by @yaoyaoding in #118
[Option] Add debug_cache_tuning option by @yaoyaoding in #120
[Fix] Remove lambda in shfl primitives by @hjjq in #121
[IR][Refactor] Refactor the functor/visitor/rewriters by @yaoyaoding in #122
[Fixbug] Fix bug in IR Printer by @hjjq in #123
[Fixbug] Fix a bug in IRModule.update_function by @yaoyaoding in #124
[Frontend] Fix typo by @digital-nomad-cheng in #127
[Operator] Add the support of using external kernels in hidet by @yaoyaoding in #128
[Tests] Reorganize tests files for frontends by @yaoyaoding in #129
[Dynamo] Added Operator Support by @AndreSlavescu in #131
[Fixbug] Allow grid compute to be inlined by @hjjq in #134
[Graph] Cast optimizations by @xinli-git in #135
[Fixbug] Fix a bug that map blockDim to blockIdx by @yaoyaoding in #136
[Fixbug] Fix a bug in rule based simplifier by @yaoyaoding in #137
[Workflow] Update concurrent graph of the ci workflow by @yaoyaoding in #138
[Runtime] Add src_path and source() members to CompiledFunction by @yaoyaoding in #139
[Runtime][IR] Support colored source code; add blockDim to extern_vars by @yaoyaoding in #140
[Fixbug] Convert tensor to cpu before dumping by @hjjq in #141

New Contributors

@digital-nomad-cheng made their first contribution in #127
@xinli-git made their first contribution in #135

Full Changelog: v0.2.1...v0.2.2

Contributors

yaoyaoding, xinli-git, and 3 other contributors

Assets 2

18 Feb 06:26

yaoyaoding

v0.2.1

0617089

Hidet v0.2.1

What's Changed

[Version] Bump version to 0.2.1.dev by @yaoyaoding in #73
[CI] Prevent fork repos from running workflow by @yaoyaoding in #74
[Fixbug] Fix a bug in trace_from when the inputs are directly used as outputs by @yaoyaoding in #76
[Operator] Add reduce_f16 and squeeze as Reduce's resolve variants by @hjjq in #75
[IR] Input specification assertion message for valid IR check by @AndreSlavescu in #78
[Operator] Add conv3d, max_pool3d, avg_pool3d by @hjjq in #79
[Dynamo] Add the entry point registration for dynamo by @yaoyaoding in #80
[Fix] Update shape utility functions to expect Sequence instead of List by @yaoyaoding in #86
[Bugfix] 'double'->'float64' in onnx dtype conversion by @soodoshll in #88
[Fix] Mark the reduce fp16 operator not fusible by @yaoyaoding in #100
[Fixbug] Use uint64_t instead of unsigned long long for literals by @yaoyaoding in #101
[Fixbug] Fix a bug in the minimum and maximum operator by @yaoyaoding in #102
[Dynamo] Update dynamo registration after pytorch refactored that part by @yaoyaoding in #84
[Fixbug] Fix bugs in binary_arithmetic op and swizzle layout by @hjjq in #104
[Fixbug] Call fuse in reduce_fp16 operator by @yaoyaoding in #105
[ONNX] Fix the out of bound error in onnx slice function during importing by @yaoyaoding in #106
[Fixbug] Reverse map of binary operator by @yaoyaoding in #107
[Fixbug] Add attributes to Clip operator by @yaoyaoding in #108
[Fixbug] Binary arthmatic ops raise error when one is scalar on GPU by @yaoyaoding in #109
[Graph] Refactor forward function of FlowGraph by @yaoyaoding in #110
[Fixbug] Use int64 as the output of arg-reduce by @yaoyaoding in #111
[README] Update readme by @yaoyaoding in #114
[Fixbug] Fix a bug when an graph output is constant by @yaoyaoding in #113
[Community] Create CODE_OF_CONDUCT.md by @yaoyaoding in #115
[Community] Update issue templates by @yaoyaoding in #116
[Fixbug] Resolve the min/max function according to compute capability by @yaoyaoding in #112
[Workflow] Update workflow by @yaoyaoding in #117
[Workflow] Update publish workflow by @yaoyaoding in #119

New Contributors

@soodoshll made their first contribution in #88

Full Changelog: v0.2.0...v0.2.1

Contributors

soodoshll, yaoyaoding, and 2 other contributors

Assets 2

13 Jan 23:59

yaoyaoding

v0.2.0

1ec7abf

Hidet v0.2.0

What's Changed

[Version] Bump version to 0.2.dev by @yaoyaoding in #60
[Frontend] Add torch.tensor binding by @yaoyaoding in #61
[Version] Add version to root namespace by @yaoyaoding in #62
[FFI] Add SharedLibrary class to track the usage of dynamic library by @yaoyaoding in #63
[Operator] Fix a bug in resize2d operator defintion by @yaoyaoding in #64
[CI] Update scripts to build wheel by @yaoyaoding in #65
[CI] Remove docs workflow by @yaoyaoding in #66
[Docs] Update README.md and require cuda-python>=11.6.1 by @yaoyaoding in #67
[Docs] Add instructions for installing nightly version of hidet by @yaoyaoding in #68
[Docs] Fixed typo in docs by @AndreSlavescu in #69
[Operator] Add dilation support for conv2d by @hjjq in #71
[Fixbug] Cast back to original data type in the mix precision pass: by @yaoyaoding in #72
[CI] Add automatic publish workflow (PyPI) by @yaoyaoding in #70

New Contributors

@AndreSlavescu made their first contribution in #69

Full Changelog: v0.1...v0.2.0

Contributors

yaoyaoding, hjjq, and AndreSlavescu

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

What's Changed

Contributors

Contributors

What's Changed

Contributors

Contributors

What's Changed

New Contributors

Contributors

Notes

Frontend

Operators and models

Distributed inference

Quantization

IR and passes

Runtime

Backends

Documentation

Others

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

Releases: hidet-org/hidet

Hidet v0.5.0

What's Changed

Contributors

Hidet v0.4.1

What's Changed

Contributors

Contributors

Hidet v0.4.0

What's Changed

Contributors

Contributors

Hidet v0.3.1

What's Changed

New Contributors

Contributors

Hidet v0.3.0

Notes

Frontend

Operators and models

Distributed inference

Quantization

IR and passes

Runtime

Backends

Documentation

Others

Contributors

Hidet v0.2.4

What's Changed

New Contributors

Contributors

Hidet v0.2.3

What's Changed

New Contributors

Contributors

Hidet v0.2.2

What's Changed

New Contributors

Contributors

Hidet v0.2.1

What's Changed

New Contributors

Contributors

Hidet v0.2.0

What's Changed

New Contributors

Contributors