Starred repositories
SGLang is a fast serving framework for large language models and vision language models.
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs
Additional utils and helpers to extend TensorFlow when build recommendation systems, contributed and maintained by SIG Recommenders.
A machine learning compiler for GPUs, CPUs, and ML accelerators
HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of HierarchicalKV is to store key-value feature-embeddings on h…
how to learn PyTorch and OneFlow
how to optimize some algorithm in cuda.
Fast and memory-efficient exact attention
Awesome-LLM: a curated list of Large Language Model
PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. ICML 2021
Alluxio, data orchestration for analytics and machine learning in the cloud
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
LightSeq: A High Performance Library for Sequence Processing and Generation
[IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
Transformer related optimization, including BERT, GPT
Ongoing research training transformer models at scale
Training and serving large-scale neural networks with auto parallelization.
ImageBind One Embedding Space to Bind Them All
Samples for CUDA Developers which demonstrates features in CUDA Toolkit