Lists (8)
Sort Name ascending (A-Z)
Stars
depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
Janus-Series: Unified Multimodal Understanding and Generation Models
Ring attention implementation with flash attention
Efficient LLM Inference over Long Sequences
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
SGLang is a fast serving framework for large language models and vision language models.
Optimized primitives for collective multi-GPU communication
OpenAI Triton backend for Intel® GPUs
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting yo…
Build Multimodal AI Agents with memory, knowledge and tools. Simple, fast and model-agnostic.
Hackable and optimized Transformers building blocks, supporting a composable construction.
[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
FastAPI framework, high performance, easy to learn, fast to code, ready for production
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
A minimal GPU design in Verilog to learn how GPUs work from the ground up
📰 Must-read papers and blogs on Speculative Decoding ⚡️