Stage 2: Automatic Differentiation¶

From Calculus to Code: Building the Foundation of Deep Learning¶

In Stage 1, we built a Markov chain language model and found optimal parameters through counting—a closed-form solution. Neural networks are fundamentally different: there's no closed-form solution. We must search for good parameters by iteratively improving them.

This search requires knowing: if I change a parameter slightly, how does the output change?

This is the domain of automatic differentiation—the technique that makes training neural networks possible. By the end of this stage, you'll understand exactly how PyTorch and TensorFlow compute gradients, because you'll have built the same system from scratch.

What We'll Build¶

A complete automatic differentiation engine that can:

Track computations as they happen
Build computational graphs automatically
Compute gradients via reverse-mode differentiation
Train neural networks using gradient descent

Prerequisites¶

Basic calculus (we'll derive everything from limits)
Python programming
Completion of Stage 1 (for context on why we need this)

Key Takeaways¶

By the end of this stage, you will understand:

Derivatives from first principles: Not just rules to memorize, but why they work
The chain rule deeply: How it enables differentiating any composition
Computational graphs: The data structure behind modern deep learning
Why reverse mode wins: The complexity analysis that explains backpropagation's efficiency
How to build autograd: The ~100 lines of core code that power gradient computation
Testing gradients: Essential techniques for verifying correctness

The Journey So Far¶

Stage	Topic	Key Insight
1	Markov Chains	Language modeling is probability estimation over sequences
2	Automatic Differentiation	Gradients enable iterative optimization—no closed-form needed
3	(Coming)	Building our first neural language model

Let's Begin¶

The derivative is where it all starts. Understanding it deeply—not just as a formula, but as a concept—unlocks everything that follows.

→ Start with Section 2.1: What is a Derivative?

Code & Resources¶

Resource	Description
`code/stage-02/value.py`	Reference implementation
`code/stage-02/tests/`	Test suite
Exercises	Practice problems
Common Mistakes	Debugging guide

Stage 2: Automatic Differentiation¶

From Calculus to Code: Building the Foundation of Deep Learning¶

What We'll Build¶

Sections¶

2.1: What is a Derivative?¶

2.2: Derivative Rules from First Principles ¶

2.3: The Chain Rule — The Heart of Backpropagation ¶

2.4: Computational Graphs ¶

2.5: Forward Mode vs Reverse Mode ¶

2.6: Building Autograd from Scratch ¶

2.7: Testing and Validation ¶

Prerequisites¶

Key Takeaways¶

The Journey So Far¶

Let's Begin¶

Code & Resources¶

Stage 2: Automatic Differentiation¶

From Calculus to Code: Building the Foundation of Deep Learning¶

What We'll Build¶

Sections¶

2.1: What is a Derivative?¶

2.2: Derivative Rules from First Principles¶

2.3: The Chain Rule — The Heart of Backpropagation¶

2.4: Computational Graphs¶

2.5: Forward Mode vs Reverse Mode¶

2.6: Building Autograd from Scratch¶

2.7: Testing and Validation¶

Prerequisites¶

Key Takeaways¶

The Journey So Far¶

Let's Begin¶

Code & Resources¶

2.2: Derivative Rules from First Principles ¶

2.3: The Chain Rule — The Heart of Backpropagation ¶

2.4: Computational Graphs ¶

2.5: Forward Mode vs Reverse Mode ¶

2.6: Building Autograd from Scratch ¶

2.7: Testing and Validation ¶