All terms
Architectures
MoE
A model with many expert sub-networks, only a few of which run per token.
Definition
A Mixture of Experts model contains many specialized sub-networks ('experts') and a router that switches on only a small subset for each token. Using just a few experts at a time gives the capacity of a very large model while keeping the work done per token low, which is why many frontier models use MoE designs.