Skip to main content
All terms
Foundations

Batch Normalization

Normalizing layer activations across a mini-batch to stabilize and speed up training.

Definition

Batch normalization rescales the activations of a layer (the intermediate numbers it computes) to a consistent, evenly centered range across each small group of training examples processed together, then lets the network learn how to scale and shift them. This stabilizes training, allows faster learning, and gently discourages overfitting. It became standard in image networks but is largely replaced by layer normalization in Transformers, where per-group statistics are less meaningful.