Stars
FlashInfer: Kernel Library for LLM Serving
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
VideoSys: An easy and efficient system for video generation
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and cont…
Efficient Triton Kernels for LLM Training
HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance
SGLang is a fast serving framework for large language models and vision language models.
A modular graph-based Retrieval-Augmented Generation (RAG) system
[NeurIPS'24 Spotlight, ICLR'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an …
RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
⚡️SwanLab - an open-source, modern-design AI training tracking and visualization tool. Supports Cloud / Self-hosted use. Integrated with PyTorch / Transformers / LLaMA Factory / Ultralytics / veRL …
[ACM MM 2024] This is the official code for "AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding"
📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
SD.Next: All-in-one for AI generative image
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
Stable Diffusion web UI
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
OneDiff: An out-of-the-box acceleration library for diffusion models.
A high-throughput and memory-efficient inference and serving engine for LLMs
Use PEFT or Full-parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, I…
Start building LLM-empowered multi-agent applications in an easier way.
An awesome & curated list of best LLMOps tools for developers
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with …