Skip to content
View smile-luobin's full-sized avatar
😃
Focusing
😃
Focusing

Block or report smile-luobin

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming

Python 51,564 6,082 Updated Mar 10, 2025

No fortress, purely open ground. OpenManus is Coming.

Python 27,074 4,065 Updated Mar 11, 2025

A global object store with S3 interface that optimize performance and cost

Rust 1 9 Updated Jul 18, 2023

Multi-GPU communication profiler and visualizer

C 26 2 Updated Jun 10, 2024

A tool to detect infrastructure issues on cloud native AI systems

Python 24 15 Updated Feb 27, 2025

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 505 35 Updated Mar 11, 2025

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 7,808 694 Updated Mar 11, 2025
Python 46 2 Updated Mar 11, 2025

Expert Parallelism Load Balancer

Python 1,050 152 Updated Feb 27, 2025

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,562 252 Updated Mar 10, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 4,886 482 Updated Mar 11, 2025

Fast and efficient attention method exploration and implementation.

C++ 19 3 Updated Mar 7, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 7,112 614 Updated Mar 11, 2025

Muon optimizer: +>30% sample efficiency with <3% wallclock overhead

Python 485 25 Updated Mar 9, 2025

FlashMLA: Efficient MLA decoding kernels

C++ 11,240 785 Updated Mar 1, 2025

Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs

Python 129 10 Updated Mar 11, 2025

MoBA: Mixture of Block Attention for Long-Context LLMs

Python 1,635 93 Updated Mar 7, 2025

Modeling, training, eval, and inference code for OLMo

Python 5,323 569 Updated Mar 11, 2025

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 752 62 Updated Sep 4, 2024

Distribute and run LLMs with a single file.

C++ 21,914 1,150 Updated Mar 11, 2025

Tensor library for machine learning

C++ 12,063 1,165 Updated Mar 11, 2025

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 12,592 838 Updated Mar 7, 2025

Making Docker and Kubernetes management easy.

TypeScript 32,261 2,547 Updated Mar 11, 2025

vCluster - Create fully functional virtual Kubernetes clusters - Each vcluster runs inside a namespace of the underlying k8s cluster. It's cheaper than creating separate full-blown clusters and it …

Go 10,588 476 Updated Mar 10, 2025

MIXQ: Taming Dynamic Outliers in Mixed-Precision Quantization by Online Prediction

Python 80 13 Updated Oct 29, 2024

LIMO: Less is More for Reasoning

Python 829 36 Updated Feb 24, 2025

Pytorch domain library for recommendation systems

Python 2,047 485 Updated Mar 11, 2025

Official implementation of Half-Quadratic Quantization (HQQ)

Python 763 77 Updated Feb 24, 2025

Serve, optimize and scale PyTorch models in production

Java 4,299 875 Updated Mar 3, 2025

A PyTorch native library for large model training

Python 3,431 309 Updated Mar 10, 2025
Next