Architectures

Transformer

The neural-network architecture, built on attention, behind modern LLMs.

Definition

The Transformer is the neural network architecture introduced in the 2017 paper 'Attention Is All You Need.' It replaced recurrence with self-attention, letting the model weigh the relevance of every token to every other token in parallel. This parallelism made it scalable to massive datasets and parameter counts, and it underpins essentially every modern large language model.

Related terms

Attention KV Cache LLM Embedding