Han-fangyuan

Follow

🎯

Focusing

Han_fangyuan Han-fangyuan

🎯

Focusing

Follow

7 followers · 92 following

Stars

deepseek-ai / DeepSeek-V3

Python 44,895 5,589 Updated Jan 26, 2025

microsoft / sarathi-serve

A low-latency & high-throughput serving engine for LLMs

Python 301 37 Updated Sep 12, 2024

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 1,879 184 Updated Jan 28, 2025

stas00 / ml-engineering

Machine Learning Engineering Open Book

Python 12,545 769 Updated Jan 28, 2025

Xnhyacinth / Awesome-LLM-Long-Context-Modeling

📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥

1,186 43 Updated Jan 17, 2025

FMInference / DejaVu

Python 311 38 Updated Apr 2, 2024

FasterDecoding / Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Jupyter Notebook 2,393 161 Updated Jun 25, 2024

snu-comparch / InfiniGen

InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)

Python 102 21 Updated Jul 10, 2024

sunkx109 / llama

Forked from meta-llama/llama

Inference code for LLaMA models

Python 113 26 Updated Aug 13, 2023

HeKun-NVIDIA / CUDA-Programming-Guide-in-Chinese

This is a Chinese translation of the CUDA programming guide

1,391 215 Updated Nov 13, 2024

ggerganov / llama.cpp

LLM inference in C/C++

C++ 71,699 10,363 Updated Jan 28, 2025

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 38,865 4,769 Updated Jan 28, 2025

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 15,206 1,436 Updated Jan 18, 2025

MoE-Inf / awesome-moe-inference

Curated collection of papers in MoE model inference

41 3 Updated Jan 21, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 7,999 779 Updated Jan 28, 2025

openai / lm-human-preferences

Code for the paper Fine-Tuning Language Models from Human Preferences

Python 1,272 164 Updated Jul 25, 2023

yandex-research / specexec

Python 48 6 Updated Nov 4, 2024

UbiquitousLearning / mllm

Fast Multimodal LLM on Mobile Devices

C++ 671 75 Updated Jan 25, 2025

gpu-mode / lectures

Material for gpu-mode lectures

Jupyter Notebook 3,567 358 Updated Jan 6, 2025

AlibabaPAI / llumnix

Efficient and easy multi-instance LLM serving

Python 284 18 Updated Jan 23, 2025

AISys-01 / vllm-CachedAttention

Forked from vllm-project/vllm

The code based on vLLM for the paper “ Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention”.

Python 6 1 Updated Sep 19, 2024

liguodongiot / llm-action

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

HTML 13,406 1,510 Updated Jan 15, 2025

NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT

C++ 5,997 897 Updated Mar 27, 2024

SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving for Local Deployment

C++ 8,065 418 Updated Sep 6, 2024

PaddleJitLab / CUDATutorial

A self-learning tutorail for CUDA High Performance Programing.

JavaScript 336 38 Updated Dec 17, 2024

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 5,340 469 Updated Jan 27, 2025

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 9,251 1,084 Updated Jan 24, 2025

alibaba / rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

C++ 606 54 Updated Jan 21, 2025

scale-snu / IDT

C 3 1 Updated Jun 5, 2024

intel / perfmon

Python 264 40 Updated Jan 27, 2025