-
SJTU
- Shanghai
- https://syfeng.net
Highlights
- Pro
Lists (3)
Sort Name ascending (A-Z)
Stars
DeepEP: an efficient expert-parallel communication library
A highly optimized LLM inference acceleration engine for Llama and its variants.
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
SGLang is a fast serving framework for large language models and vision language models.
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
verl: Volcano Engine Reinforcement Learning for LLMs
Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024
A high-throughput and memory-efficient inference and serving engine for LLMs
A PyTorch native library for large model training
Super-Efficient RLHF Training of LLMs with Parameter Reallocation
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
Ongoing research training transformer models at scale
FlagGems is an operator library for large language models implemented in Triton Language.
Development repository for the Triton-Linalg conversion
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
Awesome LLM compression research papers and tools.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Development repository for the Triton language and compiler
An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
FlashInfer: Kernel Library for LLM Serving
✨ Light and Fast AI Assistant. Support: Web | iOS | MacOS | Android | Linux | Windows