All terms
Hardware & Systems
Sharding
Splitting data or model state into pieces spread across multiple devices.
Definition
Sharding is the practice of splitting data or model state into pieces, called shards, and distributing them across multiple devices or machines. In large-model training it lets weights, gradients, and optimizer state that would not fit on one GPU be spread over many, with each device holding only its portion. This reduces per-device memory pressure and underpins techniques for training and serving models too large for a single accelerator.