All terms
Architectures
Transformer
The neural-network architecture, built on attention, behind modern LLMs.
Definition
The Transformer is the neural network architecture introduced in the 2017 paper 'Attention Is All You Need.' It replaced recurrence with self-attention, letting the model weigh the relevance of every token to every other token in parallel. This parallelism made it scalable to massive datasets and parameter counts, and it underpins essentially every modern large language model.