Triton Cache Performance Comparison

CUDA: Triton cache improves startup performance by ~20%

ROCm: Triton cache improves startup performance by ~20%

Proof of Concept

This benchmark compares GPU memory usage and startup performance of a custom vllm configuration using Triton flash attention in two scenarios:

With Triton cache pre-loaded - Cache exists from previous run
Without Triton cache - Clean cache state

Key findings:

Triton cache reduces startup time by approximately 20%
More consistent memory usage patterns with cached kernels
Improved resource utilization during initial model loading

Prerequisites

Mandatory Requirements

Triton installed

Custom vllm fork with Triton support:

git clone -b triton https://github.com/cmagina/vllm.git
cd vllm && pip install -e .

Hardware Requirements

NVIDIA GPU (CUDA) or AMD GPU (ROCm)

Usage

Basic Benchmark

./benchmark.sh --arch [cuda|rocm]

Advanced Options

# Custom cache location and script
./benchmark.sh \
  --arch cuda \
  --triton-cache-dir ~/alternate_cache \
  --script ./custom_script.py

Expected Output

gpu_usage_log.csv - Time-series memory data
gpu_memory_usage_comparison.png - Visualization plot

Technical Details

Benchmark Process

Cold Start (no cache):
- Purge existing Triton cache
- Run inference script
- Log GPU memory at 1Hz frequency
Warm Start (with cache):
- Reuse generated kernels
- Run identical inference script
- Compare memory/time metrics

Key Configuration

export VLLM_ATTENTION_BACKEND=TRITON_FLASH  # Required for Triton support
export TRITON_CACHE_DIR="~/.triton/cache"  # Default cache location

License

Apache 2.0 LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
benchmark.sh		benchmark.sh
gpu_memory_usage_comparison_cuda.png		gpu_memory_usage_comparison_cuda.png
gpu_memory_usage_comparison_rocm.png		gpu_memory_usage_comparison_rocm.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Triton Cache Performance Comparison

Proof of Concept

Prerequisites

Mandatory Requirements

Hardware Requirements

Usage

Basic Benchmark

Advanced Options

Expected Output

Technical Details

Benchmark Process

Key Configuration

License

About

Releases

Packages

Languages

License

fulvius31/triton-cache-comparison

Folders and files

Latest commit

History

Repository files navigation

Triton Cache Performance Comparison

Proof of Concept

Prerequisites

Mandatory Requirements

Hardware Requirements

Usage

Basic Benchmark

Advanced Options

Expected Output

Technical Details

Benchmark Process

Key Configuration

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages