
- Shenzhen
-
05:35
(UTC +08:00)
Starred repositories
verl: Volcano Engine Reinforcement Learning for LLMs
A Zotero plugin for syncing items and notes into Notion
A unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deploym…
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
Analyze computation-communication overlap in V3/R1.
A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Ongoing research training transformer models at scale
DeepEP: an efficient expert-parallel communication library
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
Fully open reproduction of DeepSeek-R1
The official GitHub page for the survey paper "A Survey on Mixture of Experts in Large Language Models".
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Open-Sora: Democratizing Efficient Video Production for All
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
The Open Cookbook for Top-Tier Code Large Language Model
A high-throughput and memory-efficient inference and serving engine for LLMs
A collection of phenomenons observed during the scaling of big foundation models, which may be developed into consensus, principles, or laws in the future
Secrets of RLHF in Large Language Models Part I: PPO