All terms
Architectures
Mixture of Depths
A technique where a router lets some tokens skip a Transformer layer to save computation.
Definition
Mixture of Depths is a conditional computation technique in which a router decides, per token, whether to process it through a given Transformer layer or bypass it via a residual shortcut. Tokens carrying complex information use the full layer, while simpler tokens skip it. This lowers the average compute per forward pass without reducing capacity for hard tokens. It can be combined with mixture-of-experts layers to build highly efficient sparse models.