Skip to content
View andy-yang-1's full-sized avatar

Highlights

  • Pro

Block or report andy-yang-1

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
26 stars written in Python
Clear filter

Interact with your documents using the power of GPT, 100% privately, no data leaks

Python 55,185 7,413 Updated Nov 13, 2024

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

Python 41,030 5,242 Updated Jun 27, 2024

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Python 37,737 4,615 Updated Feb 11, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 37,397 5,624 Updated Feb 12, 2025

Fast and memory-efficient exact attention

Python 15,418 1,453 Updated Feb 11, 2025

Ongoing research training transformer models at scale

Python 11,319 2,541 Updated Feb 11, 2025

StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation

Python 9,988 739 Updated Dec 4, 2024

SGLang is a fast serving framework for large language models and vision language models.

Python 9,320 889 Updated Feb 12, 2025

Running large language models on a single GPU for throughput-oriented scenarios.

Python 9,258 558 Updated Oct 28, 2024

基于开源GPT2.0的初代创作型人工智能 | 可扩展、可进化

Python 5,335 912 Updated Mar 31, 2024

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

Python 3,314 199 Updated Feb 12, 2025

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 2,833 226 Updated Feb 12, 2025

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 2,721 229 Updated Feb 11, 2025

Sky-T1: Train your own O1 preview model within $450

Python 2,482 269 Updated Feb 12, 2025

compiler learning resources collect.

Python 2,269 344 Updated May 27, 2024

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Python 1,786 102 Updated Jan 21, 2024

A curated list for Efficient Large Language Models

Python 1,434 106 Updated Feb 10, 2025

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Python 1,189 72 Updated Oct 14, 2024

Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)

Python 943 105 Updated Jan 2, 2025

[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Python 422 26 Updated Feb 10, 2025

HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

Python 414 19 Updated Oct 16, 2024

Latency and Memory Analysis of Transformer Models for Training and Inference

Python 385 44 Updated Nov 13, 2024

Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"

Python 296 24 Updated Dec 20, 2023

Puzzles for learning Triton, play it with minimal environment configuration!

Python 220 15 Updated Dec 3, 2024

Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers

Python 205 9 Updated Aug 19, 2024

A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Folders

Python 19 1 Updated Jan 24, 2025