-
flashinfer Public
Forked from flashinfer-ai/flashinferFlashInfer: Kernel Library for LLM Serving
Cuda Apache License 2.0 UpdatedFeb 12, 2025 -
TensorRT-LLM Public
Forked from NVIDIA/TensorRT-LLMTensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
C++ Apache License 2.0 UpdatedFeb 10, 2025 -
cutlass Public
Forked from NVIDIA/cutlassCUDA Templates for Linear Algebra Subroutines
C++ Other UpdatedJan 18, 2025 -
CUDA-Learn-Notes Public
Forked from DefTruth/CUDA-Learn-Notes📚150+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
Cuda GNU General Public License v3.0 UpdatedDec 29, 2024 -
-
ZhiLight Public
Forked from zhihu/ZhiLightA highly optimized inference acceleration engine for Llama and its variants.
C++ Apache License 2.0 UpdatedDec 10, 2024 -
lightllm Public
Forked from ModelTC/lightllmLightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Python Apache License 2.0 UpdatedDec 9, 2024 -
-
armnn Public
Forked from ARM-software/armnnArm NN ML Software. The code here is a read-only mirror of https://review.mlplatform.org/admin/repos/ml/armnn
C++ MIT License UpdatedOct 24, 2024 -
-
composable_kernel Public
Forked from ROCm/composable_kernelComposable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
C++ Other UpdatedSep 18, 2024 -
cuda-training-series Public
Forked from olcf/cuda-training-seriesTraining materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)
Cuda UpdatedAug 19, 2024 -
flash-attention Public
Forked from Dao-AILab/flash-attentionFast and memory-efficient exact attention
Python BSD 3-Clause "New" or "Revised" License UpdatedAug 6, 2024 -
pytorch Public
Forked from pytorch/pytorchTensors and Dynamic neural networks in Python with strong GPU acceleration
Python Other UpdatedJul 15, 2024 -
cutlass-kernels Public
Forked from ColfaxResearch/cutlass-kernelsCuda MIT License UpdatedJul 11, 2024 -
-
llm-numbers Public
Forked from ray-project/llm-numbersNumbers every LLM developer should know
UpdatedJan 16, 2024 -
INT8-Flash-Attention-FMHA-Quantization Public
Forked from jundaf2/INT8-Flash-Attention-FMHA-QuantizationCuda UpdatedSep 15, 2023 -
awesome-tensor-compilers Public
Forked from merrymercy/awesome-tensor-compilersA list of awesome compiler projects and papers for tensor computation and deep learning.
UpdatedApr 2, 2023 -
MegEngine Public
Forked from MegEngine/MegEngineMegEngine 是一个快速、可拓展、易于使用且支持自动求导的深度学习框架
C++ Apache License 2.0 UpdatedFeb 7, 2023 -
llvm-project Public
Forked from llvm/llvm-projectThe LLVM Project is a collection of modular and reusable compiler and toolchain technologies. Note: the repository does not accept github pull requests at this moment. Please submit your patches at…
Other UpdatedFeb 7, 2023 -
-
MNN Public
Forked from alibaba/MNNMNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
-
maxas Public
Forked from NervanaSystems/maxasAssembler for NVIDIA Maxwell architecture
Sass MIT License UpdatedJan 3, 2023 -
perf-ninja Public
Forked from dendibakh/perf-ninjaThis is an online course where you can learn and master the skill of low-level performance analysis and tuning.
C++ UpdatedJan 1, 2023 -
-
-
gdb-dashboard Public
Forked from cyrus-and/gdb-dashboardModular visual interface for GDB in Python
Python MIT License UpdatedOct 19, 2022 -
folly Public
Forked from facebook/follyAn open-source C++ library developed and used at Facebook.
C++ Apache License 2.0 UpdatedOct 15, 2022 -
dev-sidecar Public
Forked from docmirror/dev-sidecar开发者边车,github打不开,github加速,git clone加速,git release下载加速,stackoverflow加速
JavaScript Mozilla Public License 2.0 UpdatedAug 30, 2022