jianhuichu

Follow

jianhuichu

Follow

1 follower · 10 following

Stars

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 14,572 1,820 Updated May 23, 2025

onnx / onnx

Open standard for machine learning interoperability

Python 18,982 3,740 Updated May 22, 2025

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 10,537 1,444 Updated May 23, 2025

NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT

C++ 6,165 905 Updated Mar 27, 2024

NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

C++ 11,632 2,197 Updated May 21, 2025

deepseek-ai / DeepSeek-V3

Python 97,032 15,780 Updated Apr 9, 2025

alibaba / ChatLearn

A flexible and efficient training framework for large-scale alignment tasks

Python 356 31 Updated May 22, 2025

kubedl-io / kubedl

Run your deep learning workloads on Kubernetes more easily and efficiently.

Go 521 78 Updated Mar 4, 2024

deepspeedai / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 38,510 4,383 Updated May 23, 2025

FlagOpen / FlagEmbedding

Retrieval and Retrieval-augmented LLMs

Python 9,706 707 Updated May 22, 2025

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

Python 12,403 2,782 Updated May 22, 2025

pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 90,221 24,238 Updated May 23, 2025

tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone

C++ 190,079 74,675 Updated May 23, 2025

alibaba / Megatron-LLaMA

Forked from NVIDIA/Megatron-LM

Best practice for training LLaMA models in Megatron-LM

Python 651 57 Updated Jan 2, 2024

hemingkx / SpeculativeDecodingPapers

📰 Must-read papers and blogs on Speculative Decoding ⚡️

740 44 Updated May 22, 2025

xlite-dev / Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, Parallelism, MLA, etc.

Python 4,027 277 Updated May 18, 2025

CHIANGEL / Awesome-LLM-for-RecSys

Survey: A collection of AWESOME papers and resources on the large language model (LLM) related recommender system topics.

1,304 75 Updated May 22, 2025

alibaba / TorchEasyRec

An easy-to-use framework for large scale recommendation algorithms.

Python 160 22 Updated May 22, 2025

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 17,471 1,693 Updated May 22, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 3,019 310 Updated May 22, 2025