Skip to content
View TaoZQY's full-sized avatar

Block or report TaoZQY

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[ICML 2024] Serving LLMs on heterogeneous decentralized clusters.

Python 22 1 Updated May 6, 2024

校园二手书网上交易平台

Java 20 Updated Mar 8, 2018

This is the official code for the published paper 'Solve routing problems with a residual edge-graph attention neural network'

Python 223 26 Updated Sep 5, 2023

yshop意象点餐(扫码点餐)系统,在线点餐(外卖与自取)小程序模式,支持多门店模式,支持saas多租户模式,基础技术Java17+sprringboot3+vue3+uniapp(vue3)(支持H5、微信小程序) 采用当前流行技术组合的前后端分离点餐系统: SpringBoot3、Spring Security OAuth2、MybatisPlus、SpringSecurity、jwt、…

PLpgSQL 774 214 Updated Mar 11, 2025

Official Repo for "LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization"

Jupyter Notebook 28 2 Updated Mar 5, 2024

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

Python 9,507 545 Updated Sep 7, 2024

Large Language Model (LLM) Systems Paper List

823 32 Updated Mar 19, 2025

Serving LLMs on heterogeneous decentralized clusters.

Python 10 10 Updated Nov 30, 2023

SpotServe: Serving Generative Large Language Models on Preemptible Instances

112 9 Updated Feb 22, 2024

Defining and Enhancing Quality-of-Experience in LLM-Based Text Streaming Services

CSS 1 Updated Apr 26, 2024

Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"

Python 28 6 Updated Nov 24, 2024

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

Python 151 12 Updated Jul 5, 2024
Jupyter Notebook 89 7 Updated Nov 11, 2024

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 13,045 883 Updated Mar 22, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,891 185 Updated Mar 20, 2025

This repository is established to store personal notes and annotated papers during daily research.

115 8 Updated Mar 21, 2025
C++ 450 62 Updated Mar 20, 2025

Efficient and easy multi-instance LLM serving

Python 339 27 Updated Mar 21, 2025

PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".

Python 85 17 Updated May 23, 2023

A large-scale simulation framework for LLM inference

Python 350 61 Updated Nov 19, 2024
Jupyter Notebook 53 4 Updated Jun 13, 2024

[NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank

Python 42 10 Updated Nov 4, 2024

Github Pages template based upon HTML and Markdown for personal, portfolio-based websites.

HTML 13,611 46,282 Updated Mar 16, 2025

A low-latency & high-throughput serving engine for LLMs

Python 327 42 Updated Jan 31, 2025

LLM Serving Performance Evaluation Harness

Python 70 10 Updated Feb 25, 2025
Python 45 5 Updated Jun 27, 2024

Train Ticket - A Benchmark Microservice System

Java 753 252 Updated Mar 3, 2024

21 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/

Jupyter Notebook 75,478 39,094 Updated Mar 21, 2025

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Python 416 52 Updated Sep 11, 2024
Next