Skip to content
View ericauld's full-sized avatar

Sponsoring

@whitphx

Block or report ericauld

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

FlexAttention w/ FlashAttention3 Support

Python 27 2 Updated Oct 5, 2024

Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali

Python 1,643 119 Updated Dec 31, 2024

Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton

Python 1,516 75 Updated Dec 31, 2024

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 9,045 1,043 Updated Dec 26, 2024
C++ 25 3 Updated Jul 17, 2024

Fast and memory-efficient exact attention

Python 2 Updated Jul 30, 2024
Cuda 3 Updated Sep 2, 2024

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 583 113 Updated Oct 30, 2024

Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"

Python 355 32 Updated Feb 24, 2024

Mamba SSM architecture

Python 13,650 1,169 Updated Dec 6, 2024

Resources for Data Centric AI

TeX 1,105 117 Updated Dec 13, 2023

FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores

C++ 293 28 Updated Dec 28, 2024

Tile primitives for speedy kernels

Cuda 1,876 87 Updated Dec 23, 2024

Building blocks for foundation models.

429 17 Updated Jan 3, 2024

CUDA Core Compute Libraries

C++ 1,359 172 Updated Jan 1, 2025

LLM training in simple, raw C/CUDA

Cuda 24,883 2,825 Updated Oct 2, 2024

Fast and memory-efficient exact attention

Python 14,850 1,402 Updated Dec 31, 2024
Python 272 48 Updated Aug 1, 2024

GPU programming related news and material links

1,289 77 Updated Sep 23, 2024

CUDA Templates for Linear Algebra Subroutines

C++ 5,914 1,024 Updated Dec 25, 2024