Skip to main content
All terms
Foundations

GELU

A smooth activation function widely used inside modern Transformer models.

Definition

GELU, short for Gaussian Error Linear Unit, is an activation function — the small nonlinear step applied inside a neural network so it can learn complex patterns. It passes through more of the larger values and less of the smaller ones in a smooth way, and it is a common default in modern Transformer models.