andy-yang-1

Follow

Shuo Yang andy-yang-1

Follow

EECS PhD at UC Berkeley; @lm-sys member

127 followers · 91 following

University of California, Berkeley
Berkeley, CA
https://andy-yang-1.github.io/

Achievements

Achievements

Highlights

Pro

Stars

mit-han-lab / VisCompare

A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Folders

Python 19 1 Updated Jan 24, 2025

NovaSky-AI / SkyThought

Sky-T1: Train your own O1 preview model within $450

Python 2,478 269 Updated Feb 12, 2025

mit-han-lab / nunchaku

[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Cuda 638 39 Updated Feb 11, 2025

cumulo-autumn / StreamDiffusion

StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation

Python 9,986 739 Updated Dec 4, 2024

NVlabs / Sana

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

Python 3,312 198 Updated Feb 10, 2025

SiriusNEO / Triton-Puzzles-Lite

Puzzles for learning Triton, play it with minimal environment configuration!

Python 220 15 Updated Dec 3, 2024

mit-han-lab / duo-attention

[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Python 422 26 Updated Feb 10, 2025

mit-han-lab / hart

HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

Python 414 19 Updated Oct 16, 2024

sgl-project / sgl-learning-materials

Materials for learning SGLang

232 15 Updated Feb 6, 2025

bytedance / ABQ-LLM

An acceleration library that supports arbitrary bit-width combinatorial quantization operations

C++ 214 21 Updated Sep 30, 2024

davanstrien / awesome-synthetic-datasets

awesome synthetic (text) datasets

Jupyter Notebook 257 11 Updated Oct 29, 2024

bytedance / flux

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ 292 25 Updated Oct 30, 2024

efeslab / Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Cuda 729 28 Updated Sep 21, 2024

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 1,979 201 Updated Feb 11, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 9,304 887 Updated Feb 12, 2025

efeslab / Atom

[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Cuda 292 26 Updated Jul 2, 2024

NVIDIA / nvbench

CUDA Kernel Benchmarking Library

Cuda 557 71 Updated Nov 20, 2024

SafeAILab / EAGLE

Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)

Python 941 105 Updated Jan 2, 2025

mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 2,720 229 Updated Feb 11, 2025

hao-ai-lab / LookaheadDecoding

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Python 1,189 72 Updated Oct 14, 2024

AmadeusChan / Awesome-LLM-System-Papers

537 23 Updated Sep 5, 2024

DefTruth / Awesome-LLM-Inference

📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, Parallelism, etc. 🎉🎉

3,392 234 Updated Jan 31, 2025

cli99 / llm-analysis

Latency and Memory Analysis of Transformer Models for Training and Inference

Python 385 44 Updated Nov 13, 2024

lm-sys / llm-decontaminator

Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"

Python 296 24 Updated Dec 20, 2023

S-LoRA / S-LoRA

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Python 1,786 102 Updated Jan 21, 2024

merrymercy / awesome-tensor-compilers

A list of awesome compiler projects and papers for tensor computation and deep learning.

2,483 307 Updated Oct 19, 2024

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 9,376 1,099 Updated Feb 11, 2025

RulinShao / LightSeq

Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers

Python 205 9 Updated Aug 19, 2024

horseee / Awesome-Efficient-LLM

A curated list for Efficient Large Language Models

Python 1,434 106 Updated Feb 10, 2025

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 15,415 1,452 Updated Feb 11, 2025