Attention Visualizer
Explore how self-attention works in transformers. Enter a sentence, see how queries match keys, and observe how attention weights determine which tokens influence each output position.
Input Sentence
Click a query token to see what it attends to:
Attention(Q, K, V) = softmax(QK
T
/ √d
k
) V
Attention Pattern
Random (Untrained)
Self-Attention (Identity)
Previous Token
Next Token
Syntactic (Subject-Verb)
Distance-Based
Display Mode
Matrix
Bipartite
Masking
No Mask (Bidirectional)
Causal Mask (GPT-style)
Temperature:
1.0
Dimension (d
k
):
64
Randomize Q, K, V
Reset
Statistics
Selected Token:
-
Max Attention:
-
Entropy:
-
Scale Factor:
-
Key Insight
Select a token to see attention patterns.