Skip to content
View zhuohan123's full-sized avatar

Organizations

@alpa-projects @vllm-project

Block or report zhuohan123

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

The best OSS video generation models

Python 2,570 263 Updated Dec 18, 2024

Manipulating Python Programs

Python 620 27 Updated Dec 20, 2024

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 815 68 Updated Dec 27, 2024

A throughput-oriented high-performance serving framework for LLMs

Cuda 671 27 Updated Sep 21, 2024

Dynamic Memory Management for Serving LLMs without PagedAttention

C 262 16 Updated Dec 6, 2024

A framework for few-shot evaluation of language models.

Python 7,318 1,973 Updated Dec 25, 2024

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ 254 21 Updated Oct 30, 2024

Blazingly fast LLM inference.

Rust 4,728 327 Updated Dec 27, 2024

🙌 OpenHands: Code Less, Make More

Python 39,236 4,424 Updated Dec 28, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 45 63 Updated Dec 26, 2024

Tile primitives for speedy kernels

Cuda 1,840 86 Updated Dec 23, 2024

A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。

JavaScript 36,690 4,492 Updated Dec 23, 2024

A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems

Python 141 8 Updated Oct 15, 2024

Arena-Hard-Auto: An automatic LLM benchmark.

Jupyter Notebook 690 81 Updated Dec 14, 2024
Python 2,093 181 Updated Dec 18, 2024

CUDA/Metal accelerated language model inference

C 469 19 Updated Dec 18, 2024

DSPy: The framework for programming—not prompting—language models

Python 20,520 1,550 Updated Dec 27, 2024

A parallel framework for training deep neural networks

Python 49 5 Updated Dec 14, 2024

[ICML 2024] CLLMs: Consistency Large Language Models

Python 363 18 Updated Nov 16, 2024

A simple library for scaling up JAX programs

Python 128 10 Updated Nov 2, 2024

Grok open release

Python 49,752 8,346 Updated Aug 30, 2024

Universal LLM Deployment Engine with ML Compilation

Python 19,490 1,602 Updated Dec 19, 2024

Standardized Serverless ML Inference Platform on Kubernetes

Python 3,734 1,088 Updated Dec 27, 2024

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.

Python 8,315 826 Updated Dec 26, 2024

CUDA Python: Performance meets Productivity

Python 1,015 84 Updated Dec 27, 2024

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

Python 2,042 336 Updated Dec 20, 2024

Large World Model -- Modeling Text and Video with Millions Context

Python 7,189 555 Updated Oct 19, 2024

Building a quick conversation-based search demo with Lepton AI.

TypeScript 7,901 1,010 Updated Dec 18, 2024

LlamaIndex is a data framework for your LLM applications

Python 37,619 5,405 Updated Dec 27, 2024
Next