-
UC Berkeley
- San Francisco Bay Area
-
16:20
(UTC -08:00) - https://zhuohan.li
- @zhuohan123
- in/zhuohan-li
Stars
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
A throughput-oriented high-performance serving framework for LLMs
Dynamic Memory Management for Serving LLMs without PagedAttention
A framework for few-shot evaluation of language models.
A fast communication-overlapping library for tensor parallelism on GPUs.
HabanaAI / vllm-fork
Forked from vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
Arena-Hard-Auto: An automatic LLM benchmark.
DSPy: The framework for programming—not prompting—language models
A parallel framework for training deep neural networks
[ICML 2024] CLLMs: Consistency Large Language Models
Universal LLM Deployment Engine with ML Compilation
Standardized Serverless ML Inference Platform on Kubernetes
An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
CUDA Python: Performance meets Productivity
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…
Large World Model -- Modeling Text and Video with Millions Context
Building a quick conversation-based search demo with Lepton AI.
LlamaIndex is a data framework for your LLM applications