-
05:27
(UTC -12:00)
Highlights
- Pro
Starred repositories
A Python framework for high performance GPU simulation and graphics
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
🚀🚀 「大模型」3小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 3 hours!
This is the Rust course used by the Android team at Google. It provides you the material to quickly teach Rust.
An open-source C++ library developed and used at Facebook.
Efficient, Flexible and Portable Structured Generation
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
Examples for using ONNX Runtime for model training.
Large Language Model (LLM) Systems Paper List
A retargetable MLIR-based machine learning compiler and runtime toolkit.
OCR, layout analysis, reading order, table recognition in 90+ languages
Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
GPU programming related news and material links
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
Code samples used on cloud.google.com
FlashInfer: Kernel Library for LLM Serving
A throughput-oriented high-performance serving framework for LLMs
Lightning-fast serving engine for any AI model of any size. Flexible. Easy. Enterprise-scale.
A curated list of Rust code and resources.
⏰ Collaboratively track deadlines of conferences recommended by CCF (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~
llama3 implementation one matrix multiplication at a time
A minimal GPU design in Verilog to learn how GPUs work from the ground up