-
-
-
TransformerEngine Public
Forked from NVIDIA/TransformerEngineA library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…
Python Apache License 2.0 UpdatedOct 10, 2024 -
-
-
cs-self-learning Public
Forked from PKUFlyingPig/cs-self-learning计算机自学指南
HTML MIT License UpdatedJul 26, 2024 -
wplf Public
This is a special repository about my Github profile.
Apache License 2.0 UpdatedJun 28, 2024 -
-
Megatron-LM Public
Forked from NVIDIA/Megatron-LMOngoing research training transformer models at scale
Python Other UpdatedJun 24, 2024 -
-
-
-
-
-
Open standard for machine learning interoperability
Python Apache License 2.0 UpdatedJan 16, 2024 -
Compass_Optimizer Public
Forked from Arm-China/Compass_OptimizerCompass Optimizer (OPT for short), is part of the Zhouyi Compass Neural Network Compiler. The OPT is designed for converting the float Intermediate Representation (IR) generated by the Compass Unif…
Python Apache License 2.0 UpdatedDec 22, 2023 -
OI-wiki Public
Forked from OI-wiki/OI-wiki🌟 Wiki of OI / ICPC for everyone. (某大型游戏线上攻略,内含炫酷算术魔法)
TypeScript UpdatedDec 20, 2023 -
how-to-optimize-gemm Public
Forked from tpoisonooo/how-to-optimize-gemmrow-major matmul optimization
C++ GNU General Public License v3.0 UpdatedSep 9, 2023 -
How_to_optimize_in_GPU Public
Forked from Liu-xiandong/How_to_optimize_in_GPUThis is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…
Cuda Apache License 2.0 UpdatedJul 29, 2023 -
Compass_Unified_Parser Public
Forked from Arm-China/Compass_Unified_Parserarmchina NPU parser
Python Apache License 2.0 UpdatedJun 25, 2023 -
-
tinyflow Public
Forked from tqchen/tinyflowTutorial code on how to build your own Deep Learning System in 2k Lines
C++ Apache License 2.0 UpdatedOct 4, 2018