Stars
FlashInfer: Kernel Library for LLM Serving
SGLang is a fast serving framework for large language models and vision language models.
A high-throughput and memory-efficient inference and serving engine for LLMs
NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the effective training time by minimizing the downtime due to fa…
A tool to convert HDR file to Adaptive HDR (Gain Map HDR) and ISO HDR format in HEIC
Fast and memory-efficient exact attention
Simple, safe way to store and distribute tensors
A new markup-based typesetting system that is powerful and easy to learn.
Ongoing research training transformer models at scale
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resources in your applications.
A tool (and pre-commit hook) to automatically upgrade syntax for newer versions of the language.
An Aspiring Drop-In Replacement for NumPy at Scale
CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.
nanobind: tiny and efficient C++/Python bindings
A guide that teach you enable hardware HEVC decoding & encoding for Chrome / Edge, or build a custom version of Chromium / Electron that supports hardware & software HEVC decoding and hardware HEVC…
Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)
Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)
HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of HierarchicalKV is to store key-value feature-embeddings on h…
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…
WholeGraph - large scale Graph Neural Networks
[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl
[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl
The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.
A Python-level JIT compiler designed to make unmodified PyTorch programs faster.