All terms
Foundations
GELU
A smooth activation function widely used inside modern Transformer models.
Definition
GELU, short for Gaussian Error Linear Unit, is an activation function — the small nonlinear step applied inside a neural network so it can learn complex patterns. It passes through more of the larger values and less of the smaller ones in a smooth way, and it is a common default in modern Transformer models.