Stars
Open standard for machine learning interoperability
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
Transformer related optimization, including BERT, GPT
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
A flexible and efficient training framework for large-scale alignment tasks
Run your deep learning workloads on Kubernetes more easily and efficiently.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Retrieval and Retrieval-augmented LLMs
Ongoing research training transformer models at scale
Tensors and Dynamic neural networks in Python with strong GPU acceleration
An Open Source Machine Learning Framework for Everyone
alibaba / Megatron-LLaMA
Forked from NVIDIA/Megatron-LMBest practice for training LLaMA models in Megatron-LM
📰 Must-read papers and blogs on Speculative Decoding ⚡️
📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, Parallelism, etc. 🎉🎉
Survey: A collection of AWESOME papers and resources on the large language model (LLM) related recommender system topics.
An easy-to-use framework for large scale recommendation algorithms.
Fast and memory-efficient exact attention
FlashInfer: Kernel Library for LLM Serving