Skip to content
View wmhst7's full-sized avatar
👍
👍

Block or report wmhst7

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

324 results for source starred repositories
Clear filter

Slowdown prediction module of Echo: Simulating Distributed Training at Scale

Python 5 Updated Mar 12, 2025

Simulating Distributed Training at Scale

5 Updated Mar 18, 2025

A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.

TypeScript 9,039 655 Updated Mar 27, 2025

PerFlow-AI is a programmable performance analysis, modeling, prediction tool for AI system.

Python 18 2 Updated Mar 20, 2025

A simple calculation for LLM MFU.

HTML 29 2 Updated Mar 4, 2025

The Official Implementation of Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference

Python 68 Updated Jan 23, 2025

CUDA Templates for Linear Algebra Subroutines

C++ 7,183 1,181 Updated Mar 21, 2025

My learning notes/codes for ML SYS.

Python 1,579 89 Updated Mar 27, 2025

FlashMLA: Efficient MLA decoding kernels

C++ 11,379 810 Updated Mar 1, 2025

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 20,981 2,611 Updated Mar 4, 2025

📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism, etc. 🎉🎉

3,725 263 Updated Mar 25, 2025

A simple, performant and scalable Jax LLM!

Python 1,666 335 Updated Mar 27, 2025

Large Language Model (LLM) Systems Paper List

829 33 Updated Mar 26, 2025

A PyTorch native library for large model training

Python 3,501 321 Updated Mar 27, 2025

Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

Python 1,036 44 Updated Feb 23, 2025

[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Cuda 1,048 68 Updated Mar 25, 2025

[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…

Jupyter Notebook 7,117 451 Updated Mar 22, 2025

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Python 153 7 Updated Oct 30, 2024

verl: Volcano Engine Reinforcement Learning for LLMs

Python 5,752 571 Updated Mar 27, 2025

📰 Must-read papers and blogs on Speculative Decoding ⚡️

661 33 Updated Mar 27, 2025

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Python 31,762 2,964 Updated Mar 27, 2025

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 5,898 546 Updated Mar 13, 2025

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 9,235 652 Updated Mar 25, 2025

2025 AI/ML internship & new graduate job list updated daily

955 27 Updated Mar 27, 2025

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Cuda 242 17 Updated Oct 28, 2024

Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity

Cuda 202 16 Updated Sep 24, 2023

Open CS Application | 开源CS申请

JavaScript 2,106 244 Updated Feb 23, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 2,502 259 Updated Mar 27, 2025

A throughput-oriented high-performance serving framework for LLMs

Cuda 782 31 Updated Sep 21, 2024
Next
324 results for source starred repositories