Stage 9: Parameter-Efficient Fine-Tuning (PEFT)¶
Adapting large models without breaking the bank
Overview¶
Modern LLMs have billions of parameters. Fine-tuning them all is:
- Expensive: A 7B model needs ~28GB just for weights in fp32
- Slow: Updating billions of parameters takes time
- Wasteful: Most parameters don't need to change much
PEFT methods solve this by training only a tiny fraction of parameters while keeping most of the model frozen.
"Fine-tuning 1% of parameters can achieve 99% of full fine-tuning performance."
The Key Insight¶
Research shows that weight updates during fine-tuning have low intrinsic rank. This means:
- The change from pretrained weights to fine-tuned weights can be approximated with far fewer parameters
- We don't need to update 7 billion parameters—a few million carefully placed parameters suffice
Methods We'll Cover¶
| Method | Key Idea | Parameters |
|---|---|---|
| LoRA | Low-rank weight updates | ~0.1-1% |
| Adapters | Bottleneck layers | ~1-5% |
| Prefix Tuning | Learned key/value prefixes | ~0.01% |
| Prompt Tuning | Soft input prompts | ~0.001% |
Why This Matters¶
For a 7B parameter model:
| Method | Trainable Params | GPU Memory |
|---|---|---|
| Full fine-tuning | 7B | ~28GB |
| LoRA (r=8) | ~4M | ~8GB |
| Prompt tuning | ~80K | ~2GB |
That's the difference between needing a $10,000 GPU and a $500 one.
Learning Objectives¶
By the end of this stage, you will:
- Understand why PEFT works (the low-rank hypothesis)
- Implement LoRA from scratch
- Implement adapters with bottleneck architecture
- Understand prefix and prompt tuning
- Know when to use each method
Sections¶
- The Fine-Tuning Problem - Why full fine-tuning is hard
- LoRA: Low-Rank Adaptation - The most popular PEFT method
- Adapter Layers - Bottleneck modules
- Prefix and Prompt Tuning - Learning soft prompts
- Choosing a Method - Trade-offs and recommendations
- Implementation - Building PEFT from scratch
Prerequisites¶
- Understanding of transformer architecture (Stage 6)
- Familiarity with backpropagation (Stage 2)
- Experience with optimization (Stage 4)
Key Insight¶
PEFT isn't about approximating full fine-tuning—it's about finding the right subspace for adaptation. Often, this subspace is tiny compared to the full parameter space.
Code & Resources¶
| Resource | Description |
|---|---|
code/stage-09/peft.py |
LoRA, Adapters, and Prompt Tuning |
code/stage-09/tests/ |
Test suite |
| Exercises | Practice problems |
| Common Mistakes | Debugging guide |