-
-
CUDA-Learn-Notes Public
Forked from DefTruth/CUDA-Learn-Notes📚Tensor/CUDA Cores, 📖150+ CUDA Kernels, ⚡️⚡️toy-hgemm library with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS 🎉🎉).
Cuda GNU General Public License v3.0 UpdatedDec 16, 2024 -
Cute-Gemm-Optimization Public
Forked from DD-DuDa/Cute-LearningMakefile MIT License UpdatedDec 16, 2024 -
cutlass Public
Forked from NVIDIA/cutlassCUDA Templates for Linear Algebra Subroutines
C++ Other UpdatedDec 16, 2024 -
-
Awesome-KV-Cache-Compression Public
Forked from October2001/Awesome-KV-Cache-Compression📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
MIT License UpdatedDec 5, 2024 -
blog Public
Forked from huggingface/blogPublic repo for HF blog posts
Jupyter Notebook UpdatedNov 20, 2024 -
flux Public
Forked from bytedance/fluxA fast communication-overlapping library for tensor parallelism on GPUs.
C++ Apache License 2.0 UpdatedOct 30, 2024 -
resource-stream Public
Forked from gpu-mode/resource-streamGPU programming related news and material links
MIT License UpdatedSep 23, 2024 -
triton Public
Forked from triton-lang/tritonDevelopment repository for the Triton language and compiler
C++ MIT License UpdatedJul 1, 2024 -
CUDALibrarySamples Public
Forked from NVIDIA/CUDALibrarySamplesCUDA Library Samples
Cuda Other UpdatedApr 28, 2024 -
tiny-flash-attention Public
Forked from 66RING/tiny-flash-attentionflash attention tutorial written in python, triton, cuda, cutlass
Cuda UpdatedApr 16, 2024 -
APPy Public
Forked from habanero-lab/APPyAPPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to OpenMP, and automatically compiles the annotated code to GP…
Python MIT License UpdatedFeb 28, 2024 -
cutlass_fpA_intB_gemm Public
Forked from tlc-pack/cutlass_fpA_intB_gemmA standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer
C++ Apache License 2.0 UpdatedFeb 28, 2024 -
kernl Public
Forked from ELS-RD/kernlKernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
Jupyter Notebook Apache License 2.0 UpdatedFeb 16, 2024 -
OneNeuralNetwork Public
Forked from matrix97317/OneNeuralNetworkThis is a cross-chip platform collection of operators and a unified neural network library.
Python Apache License 2.0 UpdatedNov 3, 2023 -
Cpp-Templates-2ed Public
Forked from downdemo/Cpp-Templates-2edC++11/14/17/20 templates and generic programming, the most complex and difficult technical details of C++, indispensable in building infrastructure libraries.
C++ Apache License 2.0 UpdatedSep 4, 2023 -
FlashAttention20 Public
Forked from kyegomez/FlashAttention20Get down and dirty with FlashAttention2.0 in pytorch, plug in and play no complex CUDA kernels
Python MIT License UpdatedJul 31, 2023 -
miniob Public
Forked from oceanbase/miniobMiniOB is one mini database, helping developers to learn how database works.
C++ Mulan Permissive Software License, Version 2 UpdatedJul 27, 2023 -
concurrentqueue Public
Forked from cameron314/concurrentqueueA fast multi-producer, multi-consumer lock-free concurrent queue for C++11
C++ Other UpdatedJun 19, 2023 -
TensorRT-in-Action Public
Forked from DD-DuDa/TensorRT-in-ActionTensorRT-in-Action 是一个 GitHub 代码库,提供了使用 TensorRT 的代码示例,并有对应 Jupyter Notebook。
Jupyter Notebook Apache License 2.0 UpdatedJun 1, 2023 -
DeepLearningExamples Public
Forked from NVIDIA/DeepLearningExamplesState-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
Jupyter Notebook UpdatedMay 8, 2023 -
DeepLearningSystem Public
Forked from chenzomi12/AISystemDeep Learning System core principles introduction.
Jupyter Notebook Apache License 2.0 UpdatedMay 7, 2023 -
bitsandbytes Public
Forked from bitsandbytes-foundation/bitsandbytes8-bit CUDA functions for PyTorch
Python MIT License UpdatedApr 29, 2023 -
llama.onnx Public
Forked from wolf1981/llama.onnxllama/alpaca onnx models, quantization and testcase
Python GNU General Public License v3.0 UpdatedApr 19, 2023 -
how-to-optimize-gemm-1 Public
Forked from tpoisonooo/how-to-optimize-gemmrow-major matmul optimization
C++ GNU General Public License v3.0 UpdatedApr 5, 2023 -
-
HPC-Learning-Notes Public
Forked from Eddie-Wang1120/HPC-Learning-Notes高性能计算相关知识学习笔记,包含学习笔记和相关知识的代码demo,在持续完善中。 如果有帮助的话请Star一下,对作者帮助很大,谢谢!
Jupyter Notebook UpdatedMar 28, 2023 -
onnxruntime Public
Forked from microsoft/onnxruntimeONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
C++ MIT License UpdatedMar 17, 2023 -
ray Public
Forked from ray-project/rayRay is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a toolkit of libraries (Ray AIR) for accelerating ML workloads.
Python Apache License 2.0 UpdatedFeb 6, 2023