Yanxing-Shi

Busy with learning :>)

Casey-Shi Yanxing-Shi

Busy with learning :>)

23 followers · 114 following

AMD.inc
Beijing, China

Achievements

Lists (4)

Sort

Starred repositories

yfzhang114 / Awesome-Multimodal-Large-Language-Models

Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models

243 7 Updated Jan 7, 2025

zhihu / ZhiLight

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++ 850 102 Updated Jan 24, 2025

mit-han-lab / nunchaku

[ICLR2025] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Cuda 635 38 Updated Feb 4, 2025

NVIDIA / Star-Attention

Efficient LLM Inference over Long Sequences

Python 353 17 Updated Dec 28, 2024

DefTruth / CUDA-Learn-Notes

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 2,256 239 Updated Feb 7, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 9,105 875 Updated Feb 10, 2025

AllentDan / LibtorchTutorials

This is a code repository for pytorch c++ (or libtorch) tutorial.

C++ 760 126 Updated Nov 2, 2021

bshoshany / thread-pool

BS::thread_pool: a fast, lightweight, modern, and easy-to-use C++17 / C++20 / C++23 thread pool library

C++ 2,360 268 Updated Dec 20, 2024

NVlabs / tiny-cuda-nn

Lightning fast C++/CUDA neural network framework

C++ 3,867 478 Updated Jan 27, 2025

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 901 53 Updated Feb 10, 2025

xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with …

Python 6,210 515 Updated Feb 8, 2025

mit-han-lab / deepcompressor

Model Compression Toolbox for Large Language Models and Diffusion Models

Python 320 23 Updated Feb 1, 2025

jiazhihao / attention_superoptimizer

An Attention Superoptimizer

C++ 21 Updated Jan 20, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 1,970 198 Updated Feb 9, 2025

leejet / stable-diffusion.cpp

Stable Diffusion and Flux in pure C/C++

C++ 3,786 339 Updated Feb 5, 2025

BBuf / megatron-lm-parallel-group-playground

Python 14 Updated Mar 30, 2024

jatinx / PyHIP

Python Interface to HIP and hiprtc Library

Python 9 5 Updated Nov 19, 2023

KernelTuner / kernel_tuner

Kernel Tuner

Python 307 52 Updated Feb 9, 2025

gpt4video / GPT4Video

Offical Code for GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation

Python 134 6 Updated Oct 30, 2024

blazerye / DrugAssist

DrugAssist: A Large Language Model for Molecule Optimization

Python 120 10 Updated Jan 16, 2025

SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving for Local Deployment

C++ 8,083 421 Updated Jan 28, 2025

pytorch-labs / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 5,777 527 Updated Dec 14, 2024

lucidrains / llama-qrlhf

Implementation of the Llama architecture with RLHF + Q-learning

Python 160 7 Updated Feb 1, 2025

johnjohnlin / namedtuple

Implementation of super-fast C++-styled namedtuple, for compile-time reflection.

C++ 5 Updated Feb 5, 2023

arun11299 / cpp-subprocess

Subprocessing with modern C++

C++ 463 92 Updated Feb 28, 2024

cpm-cmake / CPM.cmake

📦 CMake's missing package manager. A small CMake script for setup-free, cross-platform, reproducible dependency management.

CMake 3,214 190 Updated Dec 29, 2024

qicosmos / cosmos

c++11基础库

C++ 999 456 Updated Nov 18, 2024

mit-han-lab / streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 6,787 377 Updated Jul 11, 2024

casper-hansen / AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Python 1,934 238 Updated Jan 20, 2025

feifeibear / LLMSpeculativeSampling

Fast inference from large lauguage models via speculative decoding

Python 647 66 Updated Aug 22, 2024

Casey-Shi Yanxing-Shi

Lists (4)

🔮 Future ideas

✨ Inspiration

LLM

🚀 My stack

Starred repositories

Machine learning