Interactive Visualizations¶

Explore the concepts from each stage with these interactive tools. All visualizations run entirely in your browser—no server required.

Attention Visualizer

Explore how self-attention works. See how queries match keys, observe attention weights, and understand causal masking.

Launch Attention Visualizer

From Stage 5: Attention
Gradient Descent Visualizer

Watch optimizers navigate loss landscapes. Compare SGD, momentum, RMSprop, and Adam on different surfaces.

Launch Optimizer Visualizer

From Stage 4: Optimization
Autograd Visualizer

Watch automatic differentiation in action. Build computational graphs, run forward passes, and see gradients flow backward.

Launch Autograd Visualizer

From Stage 2: Automatic Differentiation
Temperature Sampling

See how temperature transforms probability distributions. Experiment with different temperatures and sample tokens.

Launch Temperature Explorer

From Stage 1: Markov Chains
N-gram State Machine

Visualize Markov chains as state machines. Train on custom text, watch state transitions, and generate text step-by-step.

Launch N-gram Visualizer

From Stage 1: Markov Chains

What These Visualizations Teach¶

The attention visualizer demonstrates the core concepts from Sections 5.1-5.7:

Try these experiments:

Pattern	Masking	What it demonstrates
Random	None	Untrained attention, spread distribution
Self-Attention	None	Each position attends to itself
Previous Token	Causal	Local context pattern
Syntactic	None	Content word relationships
Distance-Based	Causal	Positional attention decay

The optimizer visualizer demonstrates the core concepts from Sections 4.2-4.5:

Loss landscapes: See how different surfaces create different optimization challenges
Optimizer comparison: Watch how SGD, momentum, and Adam behave differently
Hyperparameter effects: Explore how learning rate and momentum coefficients affect convergence
Condition number: Observe zigzagging on elongated valleys

Try these experiments:

Surface	Optimizer	What it demonstrates
Elongated Valley	SGD	Zigzag problem, slow convergence
Elongated Valley	Momentum	Dampens oscillation, faster
Rosenbrock	Adam	Navigates curved valleys
Saddle Point	Any	Escape behavior (or getting stuck)
Rastrigin	Adam	Local minima challenges

The n-gram visualizer demonstrates the core concepts from Sections 1.1-1.3:

Try these experiments:

Training Text	What it demonstrates
`abab`	Deterministic patterns
`the cat sat on the mat`	Natural language structure
`to be or not to be`	Repeated patterns create loops
`aaaaabbbbb`	Imbalanced distributions

The autograd visualizer demonstrates the core concepts from Section 2.4-2.6:

Computational graphs: See how mathematical expressions become directed acyclic graphs
Forward pass: Watch values propagate from inputs to outputs
Backward pass: Observe gradients flow in reverse via the chain rule
Local gradients: Each operation contributes its local derivative

Try these expressions to explore different patterns:

Expression	What it demonstrates
`(x + y) * z`	Basic operations, gradient accumulation
`x * x + y * y`	Sum of squares, independent gradients
`(x * y) + (y * z)`	Shared variable (y appears twice)
`x * x * x`	Power rule in action

The temperature explorer demonstrates concepts from Section 1.6:

Probability distributions: How language models represent uncertainty
Temperature scaling: The formula P_T(t) = P(t)^(1/T) / Z
Entropy: How "spread out" the distribution is
Effective vocabulary: Perplexity as the "equivalent uniform vocabulary size"

Key insights to discover:

These visualizations are built with:

Vanilla JavaScript: No build tools required, matching the "from scratch" philosophy
D3.js: For reactive data visualization
Portable design: Work offline, embed anywhere

The autograd visualizer is a direct port of the Python Value class from Stage 2, demonstrating that the same concepts work across languages.