All terms
Architectures
Cross-Attention
Attention where one sequence draws its queries from itself but keys and values from another sequence.
Definition
Cross-attention is the attention operation in which the query vectors come from one sequence while the key and value vectors come from a different one. This lets a decoder selectively focus on relevant parts of an encoder's output when producing each token. It is the defining mechanism of encoder-decoder architectures and is also used in multimodal models so a language decoder can attend to image or audio features.