All terms
Training
FSDP
A method that splits a model across many GPUs so very large models can be trained.
Definition
FSDP, short for Fully Sharded Data Parallel, trains very large models by splitting their weights and related data across many graphics cards instead of copying everything onto each one. This lets a model that would never fit on a single card be trained across a whole cluster.