Section 8.2: Loss Curve Analysis¶
Reading time: 12 minutes
The Loss Curve as Diagnostic Tool¶
Your loss curve is a medical chart for your model. Learn to read it.
Healthy Training¶
A healthy loss curve shows:
Characteristics:
- Rapid initial descent: Model learning easy patterns
- Gradual slowdown: Diminishing returns on harder patterns
- Eventual plateau: Model capacity reached or learning rate too low
Pathological Patterns¶
Pattern 1: Explosion¶
Diagnosis: Gradients exploding
Action:
- Reduce LR by 10x immediately
- Add gradient clipping
- Check for NaN in data
Pattern 2: Flat Line¶
Diagnosis: Learning rate too low OR vanishing gradients OR bug
Action:
- Check gradient norms (are they zero?)
- Run one-batch overfit test
- Increase LR by 10x
Pattern 3: Oscillation¶
Diagnosis: Learning rate too high
Action: Reduce LR by 2-5x
Pattern 4: Diverging Train/Val¶
Diagnosis: Overfitting
Action:
- Add regularization (dropout, weight decay)
- Early stopping at minimum val loss
- Data augmentation
Pattern 5: Slow Descent¶
Diagnosis: LR too low OR model too small
Action:
- Try higher LR
- Increase model capacity
- Use LR warmup
Implementing Loss Analysis¶
@dataclass
class TrainingHistory:
"""Records and analyzes training metrics."""
loss: List[float] = field(default_factory=list)
val_loss: List[float] = field(default_factory=list)
grad_norm: List[float] = field(default_factory=list)
def record(self, loss, grad_norm, val_loss=None):
self.loss.append(loss)
self.grad_norm.append(grad_norm)
if val_loss is not None:
self.val_loss.append(val_loss)
def diagnose(self) -> Dict:
"""Analyze training history and diagnose issues."""
issues = []
recommendations = []
if self._detect_explosion():
issues.append('LOSS_EXPLOSION')
recommendations.append('Reduce learning rate by 10x')
recommendations.append('Add gradient clipping')
if self._detect_plateau():
issues.append('LOSS_PLATEAU')
recommendations.append('Increase learning rate')
recommendations.append('Check model capacity')
if self._detect_overfitting():
issues.append('OVERFITTING')
recommendations.append('Add dropout')
recommendations.append('Use early stopping')
return {
'status': 'critical' if 'EXPLOSION' in str(issues) else
'warning' if issues else 'healthy',
'issues': issues,
'recommendations': recommendations,
}
def _detect_explosion(self):
"""Check for loss explosion."""
if len(self.loss) < 5:
return False
recent = self.loss[-5:]
return any(np.isnan(l) or np.isinf(l) for l in recent)
def _detect_plateau(self, window=50, threshold=0.001):
"""Check if loss has plateaued."""
if len(self.loss) < window * 2:
return False
recent = self.loss[-window:]
relative_change = (max(recent) - min(recent)) / np.mean(recent)
return relative_change < threshold
def _detect_overfitting(self):
"""Check for overfitting."""
if len(self.val_loss) < 20:
return False
n = len(self.val_loss)
early_val = np.mean(self.val_loss[:n//4])
late_val = np.mean(self.val_loss[-n//4:])
early_train = np.mean(self.loss[:n//4])
late_train = np.mean(self.loss[-n//4:])
return late_train < early_train and late_val > early_val
Smoothing for Clarity¶
Raw loss curves are noisy. Use exponential smoothing:
def smooth_loss(losses, factor=0.9):
"""Exponential moving average for cleaner visualization."""
smoothed = []
current = losses[0]
for loss in losses:
current = factor * current + (1 - factor) * loss
smoothed.append(current)
return smoothed
What to Log¶
At minimum, log every step:
| Metric | Why |
|---|---|
| Loss | Primary signal |
| Grad norm | Gradient health |
| Learning rate | Track schedule |
Log less frequently:
| Metric | Why |
|---|---|
| Val loss | Overfitting detection |
| Param norm | Weight growth |
| Per-layer grad norms | Identify problem layers |
Summary¶
| Pattern | Meaning | Fix |
|---|---|---|
| Explosion | Gradients too large | Reduce LR, clip |
| Flat | No learning | Check bugs, increase LR |
| Oscillation | LR too high | Reduce LR |
| Train/Val diverge | Overfitting | Regularize |
| Very slow | LR too low | Increase LR |
Next: We'll dive deeper into gradient statistics as health indicators.