Skip to main content
All terms
Training

ZeRO

A memory-saving method that spreads training data across GPUs to fit bigger models.

Definition

ZeRO, short for Zero Redundancy Optimizer, is a set of memory-saving techniques from Microsoft's DeepSpeed that avoid keeping duplicate copies of a model's weights and related training data on every graphics card. By spreading them across the cluster, it makes training much larger models possible.