Skip to content
View kaimo455's full-sized avatar
🚩
Stay Calm AND Carry On
🚩
Stay Calm AND Carry On
  • Shenzhen
  • 05:35 (UTC +08:00)

Block or report kaimo455

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

verl: Volcano Engine Reinforcement Learning for LLMs

Python 4,506 418 Updated Mar 8, 2025

A Zotero plugin for syncing items and notes into Notion

TypeScript 2,614 111 Updated Mar 1, 2025

A unified library of state-of-the-art model optimization techniques such as quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deploym…

Python 770 56 Updated Mar 3, 2025

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Python 8,575 1,434 Updated Feb 26, 2025

Analyze computation-communication overlap in V3/R1.

898 116 Updated Mar 3, 2025

Expert Parallelism Load Balancer

Python 1,040 151 Updated Feb 27, 2025

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,541 246 Updated Mar 5, 2025

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

Python 2,249 377 Updated Mar 9, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 4,853 474 Updated Mar 8, 2025

Ongoing research training transformer models at scale

Python 11,679 2,621 Updated Mar 8, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 7,092 611 Updated Mar 6, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

6,696 195 Updated Mar 4, 2025

Fully open reproduction of DeepSeek-R1

Python 22,416 2,011 Updated Mar 9, 2025

The official GitHub page for the survey paper "A Survey on Mixture of Experts in Large Language Models".

274 18 Updated Jan 21, 2025

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 13,281 2,723 Updated Mar 9, 2025

🧑‍🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…

Python 59,073 5,993 Updated Aug 24, 2024

Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.

Python 9,462 898 Updated Jul 1, 2024
Python 181 16 Updated Mar 9, 2025

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Python 1,565 271 Updated Jan 16, 2024

Open-Sora: Democratizing Efficient Video Production for All

Python 23,513 2,328 Updated Mar 7, 2025

[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…

Jupyter Notebook 6,861 442 Updated Jan 12, 2025

Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".

Python 773 102 Updated Aug 20, 2024

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python 2,051 162 Updated Mar 27, 2024

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,771 169 Updated Mar 7, 2025

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)

Python 5,429 532 Updated Mar 9, 2025

CUDA/Metal accelerated language model inference

C 522 23 Updated Dec 18, 2024

The Open Cookbook for Top-Tier Code Large Language Model

Python 1,633 99 Updated Dec 8, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 40,857 6,150 Updated Mar 9, 2025

A collection of phenomenons observed during the scaling of big foundation models, which may be developed into consensus, principles, or laws in the future

277 19 Updated Aug 13, 2023

Secrets of RLHF in Large Language Models Part I: PPO

Python 1,325 97 Updated Mar 3, 2024
Next