Skip to main content
All terms
Hardware & Systems

CUDA Kernel

A function that runs in parallel across many GPU threads.

Definition

A CUDA kernel is a function written to execute simultaneously across thousands of GPU threads. AI performance work often comes down to writing or tuning custom kernels (for attention, matrix multiply, etc.) to squeeze maximum throughput from the hardware. Libraries like FlashAttention and TensorRT ship highly optimized kernels.