Lists (3)
Sort Name ascending (A-Z)
Stars
[ICML 2024] Serving LLMs on heterogeneous decentralized clusters.
This is the official code for the published paper 'Solve routing problems with a residual edge-graph attention neural network'
yshop意象点餐(扫码点餐)系统,在线点餐(外卖与自取)小程序模式,支持多门店模式,支持saas多租户模式,基础技术Java17+sprringboot3+vue3+uniapp(vue3)(支持H5、微信小程序) 采用当前流行技术组合的前后端分离点餐系统: SpringBoot3、Spring Security OAuth2、MybatisPlus、SpringSecurity、jwt、…
Official Repo for "LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization"
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
Large Language Model (LLM) Systems Paper List
Serving LLMs on heterogeneous decentralized clusters.
SpotServe: Serving Generative Large Language Models on Preemptible Instances
Defining and Enhancing Quality-of-Experience in LLM-Based Text Streaming Services
Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
This repository is established to store personal notes and annotated papers during daily research.
Efficient and easy multi-instance LLM serving
PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".
A large-scale simulation framework for LLM inference
[NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank
Github Pages template based upon HTML and Markdown for personal, portfolio-based websites.
A low-latency & high-throughput serving engine for LLMs
LLM Serving Performance Evaluation Harness
FudanSELab / train-ticket
Forked from hechuan73/train_ticketTrain Ticket - A Benchmark Microservice System
21 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.