Hzfengsy

Siyuan Feng Hzfengsy

ML System & Compiler | ASF Member | PMC Member of Apache TVM

526 followers · 57 following

SJTU
Shanghai
https://syfeng.net

Achievements

x3 x3 x3 x3

Achievements

x3 x3 x3 x3

Highlights

Organizations

Lists (3)

Sort

Stars

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 7,356 693 Updated Apr 3, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient MLA decoding kernels

C++ 11,402 815 Updated Mar 1, 2025

zhihu / ZhiLight

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++ 881 103 Updated Apr 2, 2025

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 907 65 Updated Apr 3, 2025

mlc-ai / mlc-python

C++ 26 5 Updated Mar 18, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 12,805 1,422 Updated Apr 3, 2025

deepseek-ai / DeepSeek-V3

Python 94,966 15,368 Updated Mar 16, 2025

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,972 192 Updated Apr 3, 2025

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 6,110 613 Updated Apr 3, 2025

github / gitignore

A collection of useful .gitignore templates

165,453 83,086 Updated Mar 21, 2025

evalplus / evalplus

Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024

Python 1,423 145 Updated Apr 2, 2025

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 43,401 6,619 Updated Apr 3, 2025

Open-Source-O1 / Open-O1

Python 1,348 52 Updated Nov 21, 2024

pytorch / torchtitan

A PyTorch native library for large model training

Python 3,533 328 Updated Apr 3, 2025

openpsi-project / ReaLHF

Super-Efficient RLHF Training of LLMs with Parameter Reallocation

Python 264 17 Updated Jan 13, 2025

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 828 55 Updated Apr 2, 2025

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

Python 11,973 2,686 Updated Apr 2, 2025

FlagOpen / FlagGems

FlagGems is an operator library for large language models implemented in Triton Language.

Python 473 76 Updated Apr 3, 2025

ArkMowers / arknights-mower

《明日方舟》长草助手

Python 540 58 Updated Mar 20, 2025

Cambricon / triton-linalg

Development repository for the Triton-Linalg conversion

C++ 183 18 Updated Feb 7, 2025

philipturner / metal-benchmarks

Apple GPU microarchitecture

Metal 507 26 Updated Sep 22, 2024

ml-explore / mlx

MLX: An array framework for Apple silicon

C++ 20,035 1,151 Updated Apr 3, 2025

microsoft / BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 572 42 Updated Feb 14, 2025

HuangOwen / Awesome-LLM-Compression

Awesome LLM compression research papers and tools.

1,449 93 Updated Apr 1, 2025

huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python 142,359 28,506 Updated Apr 3, 2025

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 15,077 1,897 Updated Apr 3, 2025

nox-410 / tvm.tl

An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.

Python 51 2 Updated Jul 23, 2024

pytorch-labs / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 5,904 546 Updated Mar 13, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 2,556 265 Updated Apr 1, 2025

ChatGPTNextWeb / NextChat

TypeScript 82,449 61,065 Updated Mar 31, 2025

Siyuan Feng Hzfengsy

Highlights

Organizations

Lists (3)

🚀 My stack

📖 Research

🛠️ tools

Stars