All terms
Training
Activation Checkpointing
Saving memory in training by recomputing the model's intermediate working values instead of storing them all.
Definition
Activation checkpointing reduces the memory used during training by storing only a subset of activations (the model's intermediate working values) and recomputing the rest later, instead of keeping all of them. This trades extra computation for lower memory, allowing larger models or bigger batches to fit on the same hardware. It is also known as gradient checkpointing.