Stage 2 Exercises¶

Conceptual Questions¶

Exercise 2.1: Gradient Intuition¶

For f(x) = x², we know f'(x) = 2x.

a) At x = 3, which direction decreases f(x)? b) If we take a step of size 0.1 in the negative gradient direction, what is the new x? c) What is the new f(x)? Is it smaller?

Exercise 2.2: Chain Rule by Hand¶

Compute the derivative of f(x) = sin(x²) step by step.

a) Identify the outer function and inner function b) Apply the chain rule c) Verify using the computational graph approach

Exercise 2.3: Graph Topology¶

Draw the computational graph for:

z = (a + b) * (a - b)

a) How many nodes? How many edges? b) What are the intermediate values for a=3, b=2? c) Compute ∂z/∂a and ∂z/∂b using backpropagation

Exercise 2.4: Forward vs Reverse Mode¶

For f: R^n → R^m:

a) When is forward mode more efficient? b) When is reverse mode more efficient? c) Why does deep learning always use reverse mode?

Implementation Exercises¶

Exercise 2.5: Implement Division¶

Add division to the autograd system:

class Div(Op):
    """z = a / b"""

    def forward(self, a: float, b: float) -> float:
        # TODO: Implement
        pass

    def backward(self, grad_output: float) -> Tuple[float, float]:
        # d(a/b)/da = 1/b
        # d(a/b)/db = -a/b²
        # TODO: Implement
        pass

Test: Verify gradients numerically for a=6, b=2.

Exercise 2.6: Implement Power¶

Add the power operation x^n:

class Pow(Op):
    """z = a^n where n is a constant"""

    def __init__(self, n: float):
        self.n = n

    def forward(self, a: float) -> float:
        # TODO
        pass

    def backward(self, grad_output: float) -> float:
        # d(a^n)/da = n * a^(n-1)
        # TODO
        pass

Exercise 2.7: Sigmoid Gradient¶

Implement sigmoid and its gradient:

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_backward(x, grad_output):
    # Hint: d/dx sigmoid(x) = sigmoid(x) * (1 - sigmoid(x))
    # TODO
    pass

Exercise 2.8: Numerical Gradient Check¶

Implement a function to verify gradients numerically:

def check_gradient(f, x, epsilon=1e-5):
    """
    Compare analytical gradient to numerical approximation.

    Numerical gradient: (f(x+ε) - f(x-ε)) / (2ε)
    """
    # TODO: Implement
    # Return True if analytical and numerical match within tolerance
    pass

Challenge Exercises¶

Exercise 2.9: Matrix Gradient¶

For matrix multiplication Y = XW where X is (N, D) and W is (D, M):

a) What is the shape of ∂L/∂W given ∂L/∂Y? b) Derive the formula for ∂L/∂X c) Implement both gradients

def matmul_backward(X, W, grad_Y):
    """
    Returns: (grad_X, grad_W)
    """
    # TODO
    pass

Exercise 2.10: Build a Simple Neural Network¶

Using your autograd system, build a 2-layer neural network:

def forward(x, W1, b1, W2, b2):
    """
    z1 = x @ W1 + b1
    a1 = relu(z1)
    z2 = a1 @ W2 + b2
    return z2
    """
    pass

def backward(x, W1, b1, W2, b2, grad_output):
    """
    Return gradients for all parameters.
    """
    pass

Exercise 2.11: Gradient Flow Analysis¶

Consider a 10-layer network where each layer multiplies by 2.

a) If the final gradient is 1, what is the gradient at layer 1? b) What if each layer multiplies by 0.5? c) This demonstrates vanishing/exploding gradients. Propose a solution.

Checking Your Work¶

Test suite: See code/stage-02/tests/test_value.py for expected behavior
Reference implementation: Compare with code/stage-02/value.py
Self-check: Use numerical gradient checking to verify your derivatives

Mini-Project: Autograd Engine¶

Build a complete automatic differentiation engine that can train a small neural network.

Requirements¶

Value class: Implement forward and backward for +, *, -, /, **
Activations: Add tanh, relu, and sigmoid with proper gradients
Training: Train a 2-layer MLP to learn XOR

Deliverables¶

[ ] Value class with all basic operations
[ ] Gradient checking (numerical vs. autograd)
[ ] XOR network that converges to <0.01 loss
[ ] Visualization of the computational graph (optional)

Extension¶

Add support for matrix operations (matmul, sum, mean). Can you train a simple image classifier?