Stars
An open-source C++ library developed and used at Facebook.
Tile primitives for speedy kernels
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
A guidance language for controlling large language models.
A high-throughput and memory-efficient inference and serving engine for LLMs
SGLang is a fast serving framework for large language models and vision language models.
HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Development repository for the Triton language and compiler
📺IPTV电视直播源更新项目『✨秒播级体验🚀』:支持IPv4/IPv6;支持自定义频道;支持本地源、组播源、酒店源、订阅源、关键字搜索;每天自动更新两次,结果可用于TVBox等播放软件;支持工作流、Docker(amd64/arm64/arm v7)、命令行、GUI运行方式 | IPTV live TV source update project
Codes & examples for "CUDA - From Correctness to Performance"
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
All Algorithms implemented in Python
Learn Low Level Design (LLD) and prepare for interviews using free resources.
A library for transfer learning by reusing parts of TensorFlow models.
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
how to optimize some algorithm in cuda.
The C based gRPC (C++, Python, Ruby, Objective-C, PHP, C#)
FlashInfer: Kernel Library for LLM Serving
☄🌌️ The minimal, blazing-fast, and infinitely customizable prompt for any shell!
A developer reference project for creating Retrieval Augmented Generation (RAG) chatbots on Windows using TensorRT-LLM