Skip to main content
All terms
Hardware & Systems

Sharding

Splitting data or model state into pieces spread across multiple devices.

Definition

Sharding is the practice of splitting data or model state into pieces, called shards, and distributing them across multiple devices or machines. In large-model training it lets weights, gradients, and optimizer state that would not fit on one GPU be spread over many, with each device holding only its portion. This reduces per-device memory pressure and underpins techniques for training and serving models too large for a single accelerator.