Skip to content

Files

Latest commit

0b32092 · Sep 11, 2024

History

History
This branch is 16 commits behind gpu-mode/lectures:main.

lecture_018

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
May 13, 2024
Sep 11, 2024
May 13, 2024
May 13, 2024
May 13, 2024
May 13, 2024
May 13, 2024
May 13, 2024
May 13, 2024
May 13, 2024
May 13, 2024
May 13, 2024
May 13, 2024
May 13, 2024
May 13, 2024

Fused Kernels

Abstract

With focus on performance to get the most out of hardware, fusing of kernels has been a popular technique. At times, researchers/practitioners will re-write their code in native cuda or cpu kernels to get optimal performance, but projects such as torch.compile aim to make this simpler. Talk will focus on generating fused kernels and how to leverage torch.compile to be able to do that. We will shift a bit from all LLM talk and look into recommendation algorithms. In the process, we will work on creating fused kernels (triton and cuda) with the help of torch.compile.

Code and other artifacts