Skip to content
View wmhst7's full-sized avatar
👍
👍

Block or report wmhst7

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

PerFlow-AI is a programmable performance analysis, modeling, prediction tool for AI system.

Python 18 2 Updated Mar 20, 2025

A simple calculation for LLM MFU.

HTML 27 2 Updated Mar 4, 2025

The Official Implementation of Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference

Python 67 Updated Jan 23, 2025

CUDA Templates for Linear Algebra Subroutines

C++ 7,165 1,175 Updated Mar 21, 2025

My learning notes/codes for ML SYS.

Python 1,535 81 Updated Mar 24, 2025

FlashMLA: Efficient MLA decoding kernels

C++ 11,369 807 Updated Mar 1, 2025

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 20,965 2,611 Updated Mar 4, 2025

📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism, etc. 🎉🎉

3,708 261 Updated Mar 4, 2025

A simple, performant and scalable Jax LLM!

Python 1,664 333 Updated Mar 24, 2025

Large Language Model (LLM) Systems Paper List

826 32 Updated Mar 19, 2025

A PyTorch native library for large model training

Python 3,490 319 Updated Mar 24, 2025

Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

Python 1,024 46 Updated Feb 23, 2025

[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Cuda 1,027 68 Updated Mar 21, 2025

[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…

Jupyter Notebook 7,064 452 Updated Mar 22, 2025

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Python 153 7 Updated Oct 30, 2024

verl: Volcano Engine Reinforcement Learning for LLMs

Python 5,578 548 Updated Mar 24, 2025

📰 Must-read papers and blogs on Speculative Decoding ⚡️

656 32 Updated Mar 21, 2025

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Python 31,733 2,961 Updated Mar 24, 2025

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 5,896 545 Updated Mar 13, 2025

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 9,222 653 Updated Mar 24, 2025

2025 AI/ML internship & new graduate job list updated daily

946 27 Updated Mar 24, 2025

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Cuda 240 17 Updated Oct 28, 2024

Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity

Cuda 202 16 Updated Sep 24, 2023

Open CS Application | 开源CS申请

JavaScript 2,104 244 Updated Feb 23, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 2,467 258 Updated Mar 24, 2025

A throughput-oriented high-performance serving framework for LLMs

Cuda 777 30 Updated Sep 21, 2024

A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).

Python 152 12 Updated Jul 5, 2024

Disaggregated serving system for Large Language Models (LLMs).

Jupyter Notebook 512 53 Updated Aug 19, 2024
Next
Showing results