Skip to content
View yaox12's full-sized avatar
🐳
Slacking
🐳
Slacking

Block or report yaox12

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

FlashInfer: Kernel Library for LLM Serving

Cuda 2,321 241 Updated Mar 9, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 11,645 1,185 Updated Mar 10, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 40,890 6,155 Updated Mar 10, 2025

Line-by-line profiling for Python

Python 2,882 125 Updated Jan 30, 2025

NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the effective training time by minimizing the downtime due to fa…

Python 95 9 Updated Feb 21, 2025

A tool to convert HDR file to Adaptive HDR (Gain Map HDR) and ISO HDR format in HEIC

Swift 48 2 Updated Jan 14, 2025

Fast and memory-efficient exact attention

Python 16,176 1,532 Updated Mar 9, 2025

Simple, safe way to store and distribute tensors

Python 3,155 229 Updated Mar 5, 2025

CUDA Core Compute Libraries

C++ 1,509 200 Updated Mar 10, 2025

A new markup-based typesetting system that is powerful and easy to learn.

Rust 38,200 1,047 Updated Mar 7, 2025

Ongoing research training transformer models at scale

Python 11,680 2,621 Updated Mar 8, 2025

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python 140,890 28,229 Updated Mar 8, 2025

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")

C++ 313 55 Updated Mar 10, 2025

The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resources in your applications.

C++ 354 50 Updated Mar 5, 2025

A tool (and pre-commit hook) to automatically upgrade syntax for newer versions of the language.

Python 3,719 188 Updated Feb 17, 2025

An Aspiring Drop-In Replacement for NumPy at Scale

Python 828 81 Updated Mar 5, 2025

CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.

C++ 2,457 222 Updated Mar 3, 2025

nanobind: tiny and efficient C++/Python bindings

C++ 2,636 220 Updated Mar 2, 2025
Rust 28 15 Updated Dec 7, 2024

A guide that teach you enable hardware HEVC decoding & encoding for Chrome / Edge, or build a custom version of Chromium / Electron that supports hardware & software HEVC decoding and hardware HEVC…

JavaScript 1,293 63 Updated Feb 28, 2025

Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)

Python 8,787 361 Updated Feb 9, 2025

Unified Collective Communication Library

C 231 105 Updated Mar 6, 2025

Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)

C 1,237 441 Updated Mar 9, 2025

HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of HierarchicalKV is to store key-value feature-embeddings on h…

Cuda 140 27 Updated Mar 2, 2025

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

Python 2,249 377 Updated Mar 9, 2025

WholeGraph - large scale Graph Neural Networks

Cuda 101 38 Updated Nov 25, 2024

[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl

C++ 2,297 186 Updated Feb 7, 2024

[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl

C++ 4,953 756 Updated Feb 8, 2024

The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.

C++ 1,460 538 Updated Mar 10, 2025

A Python-level JIT compiler designed to make unmodified PyTorch programs faster.

Python 1,034 125 Updated Apr 17, 2024
Next