ericauld

Eric Auld ericauld

GPU programming at Together AI

30 followers · 250 following

Together AI
LA & SF
ericauld.github.io
@aulderic
in/eric-auld

Sponsoring

Achievements

Stars

GindaChen / FlexFlashAttention3

FlexAttention w/ FlashAttention3 Support

Python 27 2 Updated Oct 5, 2024

michaelfeil / infinity

Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali

Python 1,643 119 Updated Dec 31, 2024

fla-org / flash-linear-attention

Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton

Python 1,516 75 Updated Dec 31, 2024

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 9,045 1,043 Updated Dec 26, 2024

AlibabaResearch / mononn

C++ 25 3 Updated Jul 17, 2024

ericauld / flash-attention

Forked from Dao-AILab/flash-attention

Fast and memory-efficient exact attention

Python 2 Updated Jul 30, 2024

ericauld / cutlass-playground

Cuda 3 Updated Sep 2, 2024

NVIDIA / multi-gpu-programming-models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 583 113 Updated Oct 30, 2024

Cornell-RelaxML / QuIP

Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"

Python 355 32 Updated Feb 24, 2024

state-spaces / mamba

Mamba SSM architecture

Python 13,650 1,169 Updated Dec 6, 2024

HazyResearch / data-centric-ai

Resources for Data Centric AI

TeX 1,105 117 Updated Dec 13, 2023

HazyResearch / flash-fft-conv

FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores

C++ 293 28 Updated Dec 28, 2024

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 1,876 87 Updated Dec 23, 2024

HazyResearch / aisys-building-blocks

Building blocks for foundation models.

429 17 Updated Jan 3, 2024

NVIDIA / cccl

CUDA Core Compute Libraries

C++ 1,359 172 Updated Jan 1, 2025

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 24,883 2,825 Updated Oct 2, 2024

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 14,850 1,402 Updated Dec 31, 2024

lindermanlab / S5

Python 272 48 Updated Aug 1, 2024

gpu-mode / resource-stream

GPU programming related news and material links

1,289 77 Updated Sep 23, 2024

NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines

C++ 5,914 1,024 Updated Dec 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eric Auld ericauld

Sponsoring

Achievements

Achievements

Block or report ericauld

Stars

GindaChen / FlexFlashAttention3

michaelfeil / infinity

fla-org / flash-linear-attention

NVIDIA / TensorRT-LLM

AlibabaResearch / mononn

ericauld / flash-attention

ericauld / cutlass-playground

NVIDIA / multi-gpu-programming-models

Cornell-RelaxML / QuIP

state-spaces / mamba

HazyResearch / data-centric-ai

HazyResearch / flash-fft-conv

HazyResearch / ThunderKittens

HazyResearch / aisys-building-blocks

NVIDIA / cccl

karpathy / llm.c

Dao-AILab / flash-attention

lindermanlab / S5

gpu-mode / resource-stream

NVIDIA / cutlass