All terms
Training
Model Parallelism
Splitting one model across multiple devices so it fits in memory.
Definition
Model parallelism divides a single model across several devices so a network too large for one GPU can still train and run. It includes tensor parallelism, which splits matrices within a layer, and pipeline parallelism, which assigns groups of layers to different devices. It is often combined with data parallelism to scale very large models.