Skip to content
View Hzfengsy's full-sized avatar

Highlights

  • Pro

Organizations

@apache @cityflow-project @tlc-pack @mlc-ai

Block or report Hzfengsy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

DeepEP: an efficient expert-parallel communication library

Cuda 7,356 693 Updated Apr 3, 2025

FlashMLA: Efficient MLA decoding kernels

C++ 11,402 815 Updated Mar 1, 2025

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++ 881 103 Updated Apr 2, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 907 65 Updated Apr 3, 2025
C++ 26 5 Updated Mar 18, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 12,805 1,422 Updated Apr 3, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,972 192 Updated Apr 3, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 6,110 613 Updated Apr 3, 2025

A collection of useful .gitignore templates

165,453 83,086 Updated Mar 21, 2025

Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024

Python 1,423 145 Updated Apr 2, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 43,401 6,619 Updated Apr 3, 2025
Python 1,348 52 Updated Nov 21, 2024

A PyTorch native library for large model training

Python 3,533 328 Updated Apr 3, 2025

Super-Efficient RLHF Training of LLMs with Parameter Reallocation

Python 264 17 Updated Jan 13, 2025

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 828 55 Updated Apr 2, 2025

Ongoing research training transformer models at scale

Python 11,973 2,686 Updated Apr 2, 2025

FlagGems is an operator library for large language models implemented in Triton Language.

Python 473 76 Updated Apr 3, 2025

《明日方舟》长草助手

Python 540 58 Updated Mar 20, 2025

Development repository for the Triton-Linalg conversion

C++ 183 18 Updated Feb 7, 2025

Apple GPU microarchitecture

Metal 507 26 Updated Sep 22, 2024

MLX: An array framework for Apple silicon

C++ 20,035 1,151 Updated Apr 3, 2025

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 572 42 Updated Feb 14, 2025

Awesome LLM compression research papers and tools.

1,449 93 Updated Apr 1, 2025

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python 142,359 28,506 Updated Apr 3, 2025

Development repository for the Triton language and compiler

MLIR 15,077 1,897 Updated Apr 3, 2025

An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.

Python 51 2 Updated Jul 23, 2024

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 5,904 546 Updated Mar 13, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 2,556 265 Updated Apr 1, 2025

✨ Light and Fast AI Assistant. Support: Web | iOS | MacOS | Android | Linux | Windows

TypeScript 82,449 61,065 Updated Mar 31, 2025
Next