Stars
A high-throughput and memory-efficient inference and serving engine for LLMs
Enjoy the magic of Diffusion models!
No fortress, purely open ground. OpenManus is Coming.
Building DeepSeek R1 from Scratch
A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
Muon optimizer: +>30% sample efficiency with <3% wallclock overhead
Fully open reproduction of DeepSeek-R1
Solve Visual Understanding with Reinforced VLMs
pytorch distribute tutorials
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
alibaba / Megatron-LLaMA
Forked from NVIDIA/Megatron-LMBest practice for training LLaMA models in Megatron-LM
Stable-Hair: Real-World Hair Transfer via Diffusion Model (AAAI 2025)
《代码随想录》LeetCode 刷题攻略:200道经典题目刷题顺序,共60w字的详细图解,视频难点剖析,50余张思维导图,支持C++,Java,Python,Go,JavaScript等多语言版本,从此算法学习不再迷茫!🔥🔥 来看看,你会发现相见恨晚!🚀
Transformer Explained Visually: Learn How LLM Transformer Models Work with Interactive Visualization
Transformer seq2seq model, program that can build a language translator from parallel corpus
Transformer: PyTorch Implementation of "Attention Is All You Need"
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Ongoing research training transformer models at scale