TaoZQY

Tao ZHANG TaoZQY

Talk is cheap show me your code

0 followers · 13 following

USTC
[email protected]

Lists (3)

Sort

Stars

Relaxed-System-Lab / HexGen

[ICML 2024] Serving LLMs on heterogeneous decentralized clusters.

Python 22 1 Updated May 6, 2024

DaDaDouDouer / atschool

校园二手书网上交易平台

Java 20 Updated Mar 8, 2018

Lei-Kun / DRL-and-graph-neural-network-for-routing-problems

This is the official code for the published paper 'Solve routing problems with a residual edge-graph attention neural network'

Python 223 26 Updated Sep 5, 2023

guchengwuyue / yshop-drink

yshop意象点餐(扫码点餐)系统，在线点餐(外卖与自取)小程序模式，支持多门店模式，支持saas多租户模式，基础技术Java17+sprringboot3+vue3+uniapp(vue3)（支持H5、微信小程序）采用当前流行技术组合的前后端分离点餐系统： SpringBoot3、Spring Security OAuth2、MybatisPlus、SpringSecurity、jwt、…

PLpgSQL 774 214 Updated Mar 11, 2025

ZJU-LLMs / Foundations-of-LLMs

9,173 784 Updated Jan 14, 2025

tonyzhao-jt / LLM-PQ

Official Repo for "LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization"

Jupyter Notebook 28 2 Updated Mar 5, 2024

bigscience-workshop / petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

Python 9,507 545 Updated Sep 7, 2024

AmberLJC / LLMSys-PaperList

Large Language Model (LLM) Systems Paper List

823 32 Updated Mar 19, 2025

zhenyutiancandy / HexGen

Serving LLMs on heterogeneous decentralized clusters.

Python 10 10 Updated Nov 30, 2023

Hsword / SpotServe

SpotServe: Serving Generative Large Language Models on Preemptible Instances

112 9 Updated Feb 22, 2024

llm-qoe / llm-qoe.github.io

Defining and Enhancing Quality-of-Experience in LLM-Based Text Streaming Services

CSS 1 Updated Apr 26, 2024

Thesys-lab / Helix-ASPLOS25

Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"

Python 28 6 Updated Nov 24, 2024

interestingLSY / swiftLLM

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

Python 151 12 Updated Jul 5, 2024

LoongServe / LoongServe

Jupyter Notebook 89 7 Updated Nov 11, 2024

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 13,045 883 Updated Mar 22, 2025

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,891 185 Updated Mar 20, 2025

DicardoX / Research-Space

This repository is established to store personal notes and annotated papers during daily research.

115 8 Updated Mar 21, 2025

aliyun / SimAI

C++ 450 62 Updated Mar 20, 2025

AlibabaPAI / llumnix

Efficient and easy multi-instance LLM serving

Python 339 27 Updated Mar 21, 2025

zhengzangw / Sequence-Scheduling

PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".

Python 85 17 Updated May 23, 2023

microsoft / vidur

A large-scale simulation framework for LLM inference

Python 350 61 Updated Nov 19, 2024

hao-ai-lab / MuxServe

Jupyter Notebook 53 4 Updated Jun 13, 2024

hao-ai-lab / vllm-ltr

[NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank

Python 42 10 Updated Nov 4, 2024

academicpages / academicpages.github.io

Github Pages template based upon HTML and Markdown for personal, portfolio-based websites.

HTML 13,611 46,282 Updated Mar 16, 2025

microsoft / sarathi-serve

A low-latency & high-throughput serving engine for LLMs

Python 327 42 Updated Jan 31, 2025

project-etalon / etalon

LLM Serving Performance Evaluation Harness

Python 70 10 Updated Feb 25, 2025

tyler-griggs / melange-release

Python 45 5 Updated Jun 27, 2024

FudanSELab / train-ticket

Forked from hechuan73/train_ticket

Train Ticket - A Benchmark Microservice System

Java 753 252 Updated Mar 3, 2024

microsoft / generative-ai-for-beginners

21 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/

Jupyter Notebook 75,478 39,094 Updated Mar 21, 2025

hahnyuan / LLM-Viewer

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Python 416 52 Updated Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly