Skip to content
View jianhuichu's full-sized avatar

Block or report jianhuichu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

SGLang is a fast serving framework for large language models and vision language models.

Python 14,572 1,820 Updated May 23, 2025

Open standard for machine learning interoperability

Python 18,982 3,740 Updated May 22, 2025

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 10,537 1,444 Updated May 23, 2025

Transformer related optimization, including BERT, GPT

C++ 6,165 905 Updated Mar 27, 2024

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

C++ 11,632 2,197 Updated May 21, 2025

A flexible and efficient training framework for large-scale alignment tasks

Python 356 31 Updated May 22, 2025

Run your deep learning workloads on Kubernetes more easily and efficiently.

Go 521 78 Updated Mar 4, 2024

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 38,510 4,383 Updated May 23, 2025

Retrieval and Retrieval-augmented LLMs

Python 9,706 707 Updated May 22, 2025

Ongoing research training transformer models at scale

Python 12,403 2,782 Updated May 22, 2025

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 90,221 24,238 Updated May 23, 2025

An Open Source Machine Learning Framework for Everyone

C++ 190,079 74,675 Updated May 23, 2025

Best practice for training LLaMA models in Megatron-LM

Python 651 57 Updated Jan 2, 2024

📰 Must-read papers and blogs on Speculative Decoding ⚡️

740 44 Updated May 22, 2025

📚A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, Parallelism, MLA, etc.

Python 4,027 277 Updated May 18, 2025

Survey: A collection of AWESOME papers and resources on the large language model (LLM) related recommender system topics.

1,304 75 Updated May 22, 2025

An easy-to-use framework for large scale recommendation algorithms.

Python 160 22 Updated May 22, 2025

Fast and memory-efficient exact attention

Python 17,471 1,693 Updated May 22, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 3,019 310 Updated May 22, 2025