All terms
Foundations
Activation Function
The nonlinear step that lets stacked network layers learn complex patterns.
Definition
An activation function transforms a neuron's weighted input into its output in a nonlinear way. Without it, stacking layers would collapse into a single linear transformation, sharply limiting what a network can learn. The choice affects how smoothly the network trains. Common options include ReLU, sigmoid, and tanh, while modern Transformers tend to use GELU or SwiGLU — these are all just named recipes for that nonlinear step.