Starred repositories
Zstandard - Fast real-time compression algorithm
Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.
Triton is a dynamic binary analysis library. Build your own program analysis tools, automate your reverse engineering, perform software verification or just emulate code.
OpenPCDet Toolbox for LiDAR-based 3D Object Detection.
OpenMMLab's next-generation platform for general 3D object detection.
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
A high-throughput and memory-efficient inference and serving engine for LLMs
[EuroSys'24] Minuet: Accelerating 3D Sparse Convolutions on GPUs
TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches
Synthesizer for optimal collective communication algorithms
Optimized primitives for collective multi-GPU communication
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, m…
The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs
Examples of using sparse attention, as in "Generating Long Sequences with Sparse Transformers"
Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.
A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores