Skip to main content
All terms
Training

Weight Initialization

Choosing the starting values of a network's parameters before training.

Definition

Weight initialization sets the starting values of a neural network's parameters before any gradient updates. Poor choices can cause vanishing or exploding gradients, making training slow or impossible. Common strategies include Xavier initialization for sigmoid-based networks and Kaiming initialization for ReLU networks, while large transformers often use schemes tuned for their depth.