All terms
Training
Weight Initialization
Choosing the starting values of a network's parameters before training.
Definition
Weight initialization sets the starting values of a neural network's parameters before any gradient updates. Poor choices can cause vanishing or exploding gradients, making training slow or impossible. Common strategies include Xavier initialization for sigmoid-based networks and Kaiming initialization for ReLU networks, while large transformers often use schemes tuned for their depth.