Stars
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
R1-onevision, a visual language model capable of deep CoT reasoning.
SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on One GPU in a Day"
AlpinDale / mergekit-LGPL
Forked from arcee-ai/mergekitTools for merging pretrained large language models.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
FlashMLA: Efficient MLA Decoding Kernel for Hopper GPUs
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Muon optimizer: +>30% sample efficiency with <3% wallclock overhead
Clearbox AI's all-in-one solution for generation and evaluation of synthetic tabular and time-series data.
Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper
Examples and guides for using the Gemini API
Implementation of the proposed DeepCrossAttention by Heddes et al at Google research, in Pytorch
A Qwen .5B reasoning model trained on OpenR1-Math-220k
A very simple GRPO implement for reproducing r1-like LLM thinking.
[ICML 2024 Best Paper] Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution (https://arxiv.org/abs/2310.16834)
Official PyTorch implementation for "Large Language Diffusion Models"
BOM, STL files and instructions for PAROL6 3D printed robot arm
💬 An extensive collection of exceptional resources dedicated to the captivating world of talking face synthesis! ⭐ If you find this repo useful, please give it a star! 🤩
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS …
RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.
Reverse Engineering: Decompiling Binary Code with Large Language Models
Pretraining code for a large-scale depth-recurrent language model