
-
HW->MGTV->...
Starred repositories
🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
No fortress, purely open ground. OpenManus is Coming.
A global object store with S3 interface that optimize performance and cost
A tool to detect infrastructure issues on cloud native AI systems
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
MetaX-MACA / FlashMLA
Forked from deepseek-ai/FlashMLAFast and efficient attention method exploration and implementation.
DeepEP: an efficient expert-parallel communication library
Muon optimizer: +>30% sample efficiency with <3% wallclock overhead
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
MoBA: Mixture of Block Attention for Long-Context LLMs
Modeling, training, eval, and inference code for OLMo
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Distribute and run LLMs with a single file.
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Making Docker and Kubernetes management easy.
vCluster - Create fully functional virtual Kubernetes clusters - Each vcluster runs inside a namespace of the underlying k8s cluster. It's cheaper than creating separate full-blown clusters and it …
MIXQ: Taming Dynamic Outliers in Mixed-Precision Quantization by Online Prediction
Pytorch domain library for recommendation systems
Official implementation of Half-Quadratic Quantization (HQQ)
Serve, optimize and scale PyTorch models in production
A PyTorch native library for large model training