Attention Visualizer

Explore how self-attention works in transformers. Enter a sentence, see how queries match keys, and observe how attention weights determine which tokens influence each output position.

Input Sentence

Click a query token to see what it attends to:

Attention(Q, K, V) = softmax(QK^T / √d_k) V

Attention Pattern

Display Mode

Matrix

Bipartite

Masking

Temperature: 1.0

Dimension (d_k): 64

Statistics

Selected Token: -

Max Attention: -

Entropy: -

Scale Factor: -

Key Insight

Select a token to see attention patterns.