Skip to content
View jianhuichu's full-sized avatar

Block or report jianhuichu

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Open standard for machine learning interoperability

Python 18,555 3,721 Updated Mar 6, 2025

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 9,626 1,131 Updated Mar 7, 2025

Transformer related optimization, including BERT, GPT

C++ 6,068 899 Updated Mar 27, 2024

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

C++ 11,291 2,164 Updated Feb 1, 2025

A flexible and efficient training framework for large-scale alignment tasks

Python 318 24 Updated Feb 14, 2025

Run your deep learning workloads on Kubernetes more easily and efficiently.

Go 517 78 Updated Mar 4, 2024

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 37,240 4,281 Updated Mar 6, 2025

Retrieval and Retrieval-augmented LLMs

Python 8,811 642 Updated Mar 6, 2025

Ongoing research training transformer models at scale

Python 11,662 2,617 Updated Mar 7, 2025

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 87,607 23,537 Updated Mar 7, 2025

An Open Source Machine Learning Framework for Everyone

C++ 188,431 74,572 Updated Mar 7, 2025

Best practice for training LLaMA models in Megatron-LM

Python 646 56 Updated Jan 2, 2024

📰 Must-read papers and blogs on Speculative Decoding ⚡️

622 31 Updated Mar 6, 2025

📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, Parallelism, etc. 🎉🎉

3,591 252 Updated Mar 4, 2025

Survey: A collection of AWESOME papers and resources on the large language model (LLM) related recommender system topics.

1,213 68 Updated Mar 5, 2025

An easy-to-use framework for large scale recommendation algorithms.

Python 90 18 Updated Mar 6, 2025

Fast and memory-efficient exact attention

Python 16,130 1,527 Updated Mar 7, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 2,298 240 Updated Mar 6, 2025