Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
3rd-party		3rd-party
docs		docs
examples		examples
include		include
pytiledcuda		pytiledcuda
scripts		scripts
src		src
tests		tests
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Repository files navigation

TiledCUDA

Introduction

TiledCUDA is a kernel template library that is designed to be highly efficient. It provides a wrapper for cutlass CuTe to simplifly the process of implementing complex fused kernels that utilize tensor core GEMM.

TiledCUDA utilizes PyTorch as its runtime environment and leverages the Tensor class of PyTorch for convenient testing.

Quick Start

Download

git clone [email protected]:TiledTensor/TiledCUDA.git
cd TiledCUDA && git submodule update --init --recursive

Installation

TileCUDA requires a C++20 host compiler, CUDA 12.0 or later, and GCC version 10.0 or higher to support C++20 features.

Unit Test

Run a single unit test: make unit_test UNIT_TEST=test_scatter_nd.py
Run all unit tests: ./scripts/unittests/python.sh
Run a single cpp unit test: make unit_test_cpp CPP_UT=test_copy
Run all cpp unit tests: make unit_test_cpps

Features

Implemented __device__ function wrapper that enables static/dynamic copying between different memory hierarchy.
Implemented __device__ function wrapper for CUDA micro kernels, such as copy_async and tensor core operations.
Implemented template wrapper for CuTe to simplify its usage.
Implemented fused kernels such as GEMM, Back2Back GEMM, Batched GEMM, Lstm Cell, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TiledCUDA

Introduction

Quick Start

Download

Installation

Unit Test

Features

About

Releases

Packages

Languages

License

YangWang92/TiledCUDA

Folders and files

Latest commit

History

Repository files navigation

TiledCUDA

Introduction

Quick Start

Download

Installation

Unit Test

Features

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages