smile-luobin

Follow

😃

Focusing

smile-luobin smile-luobin

😃

Focusing

Follow

10 followers · 43 following

HW->MGTV->...

Achievements

Achievements

Starred repositories

geekan / MetaGPT

🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming

Python 51,564 6,082 Updated Mar 10, 2025

mannaandpoem / OpenManus

No fortress, purely open ground. OpenManus is Coming.

Python 27,074 4,065 Updated Mar 11, 2025

nhat416 / skystore

A global object store with S3 interface that optimize performance and cost

Rust 1 9 Updated Jul 18, 2023

ParCoreLab / Snoopie

Multi-GPU communication profiler and visualizer

C 26 2 Updated Jun 10, 2024

IBM / autopilot

A tool to detect infrastructure issues on cloud native AI systems

Python 24 15 Updated Feb 27, 2025

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 505 35 Updated Mar 11, 2025

deepseek-ai / 3FS

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 7,808 694 Updated Mar 11, 2025

microsoft / AttentionEngine

Python 46 2 Updated Mar 11, 2025

deepseek-ai / EPLB

Expert Parallelism Load Balancer

Python 1,050 152 Updated Feb 27, 2025

deepseek-ai / DualPipe

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,562 252 Updated Mar 10, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 4,886 482 Updated Mar 11, 2025

MetaX-MACA / FlashMLA

Forked from deepseek-ai/FlashMLA

Fast and efficient attention method exploration and implementation.

C++ 19 3 Updated Mar 7, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 7,112 614 Updated Mar 11, 2025

KellerJordan / Muon

Muon optimizer: +>30% sample efficiency with <3% wallclock overhead

Python 485 25 Updated Mar 9, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient MLA decoding kernels

C++ 11,240 785 Updated Mar 1, 2025

JT-Ushio / MHA2MLA

Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs

Python 129 10 Updated Mar 11, 2025

MoonshotAI / MoBA

MoBA: Mixture of Block Attention for Long-Context LLMs

Python 1,635 93 Updated Mar 7, 2025

allenai / OLMo

Modeling, training, eval, and inference code for OLMo

Python 5,323 569 Updated Mar 11, 2025

IST-DASLab / marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 752 62 Updated Sep 4, 2024

Mozilla-Ocho / llamafile

Distribute and run LLMs with a single file.

C++ 21,914 1,150 Updated Mar 11, 2025

ggml-org / ggml

Tensor library for machine learning

C++ 12,063 1,165 Updated Mar 11, 2025

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 12,592 838 Updated Mar 7, 2025

portainer / portainer

Making Docker and Kubernetes management easy.

TypeScript 32,261 2,547 Updated Mar 11, 2025

loft-sh / vcluster

vCluster - Create fully functional virtual Kubernetes clusters - Each vcluster runs inside a namespace of the underlying k8s cluster. It's cheaper than creating separate full-blown clusters and it …

Go 10,588 476 Updated Mar 10, 2025

Qcompiler / MIXQ

MIXQ: Taming Dynamic Outliers in Mixed-Precision Quantization by Online Prediction

Python 80 13 Updated Oct 29, 2024

GAIR-NLP / LIMO

LIMO: Less is More for Reasoning

Python 829 36 Updated Feb 24, 2025

pytorch / torchrec

Pytorch domain library for recommendation systems

Python 2,047 485 Updated Mar 11, 2025

mobiusml / hqq

Official implementation of Half-Quadratic Quantization (HQQ)

Python 763 77 Updated Feb 24, 2025

pytorch / serve

Serve, optimize and scale PyTorch models in production

Java 4,299 875 Updated Mar 3, 2025

pytorch / torchtitan

A PyTorch native library for large model training

Python 3,431 309 Updated Mar 10, 2025

Starred topics

IPFS