2 Environment Setup
Get your interpretability toolkit running in 10 minutes
This guide walks you through setting up everything you need to run the code in this book and start your own interpretability research.
2.1 Prerequisites Self-Assessment
Before diving in, check if you have the background needed. Don’t worry if you’re missing some—links are provided to fill gaps.
Can you read and understand this code?
If not: Work through Python for Everybody (free) or the first few chapters of any Python tutorial.
Do you understand these concepts intuitively?
- Vectors: A list of numbers representing a point/direction in space
- Dot product: Measures how aligned two vectors are (cos similarity)
- Matrix multiplication: Transforms vectors from one space to another
- Orthogonality: Two vectors at 90° have dot product = 0
If not: Watch 3Blue1Brown’s Essence of Linear Algebra (free, ~3 hours total, but first 4 videos are enough).
Do you know what these are?
- Attention: Mechanism for tokens to “look at” other tokens
- MLP/FFN: Feed-forward layers that process each position independently
- Residual stream: The main “highway” that information flows through
- Embeddings: Converting tokens to vectors
If not: Don’t worry! Chapter 2 covers this from scratch. But if you want a head start, read The Illustrated Transformer (free, ~30 min).
Ready? If you can read Python and do basic PyTorch tensor operations, you have enough to start. The book explains everything else as needed.
2.2 Quick Start: Google Colab (Recommended)
The fastest way to get started—no installation required.
2.2.1 Step 1: Open Colab
Go to colab.research.google.com and create a new notebook.
2.2.2 Step 2: Enable GPU
- Click Runtime → Change runtime type
- Select T4 GPU (free tier) or A100 (if you have Colab Pro)
- Click Save
2.2.3 Step 3: Install Libraries
Run this cell:
# Core libraries for mechanistic interpretability
!pip install transformer-lens einops jaxtyping circuitsvis plotly
# For SAE work
!pip install sae-lens
# Verify installation
import transformer_lens
import sae_lens
print(f"TransformerLens: {transformer_lens.__version__}")
print(f"SAELens: {sae_lens.__version__}")2.2.4 Step 4: Verify Everything Works
import torch
import transformer_lens as tl
# Check GPU
print(f"GPU available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
print(f"GPU: {torch.cuda.get_device_name(0)}")
# Load a model
model = tl.HookedTransformer.from_pretrained("gpt2-small")
print(f"Model loaded: {model.cfg.model_name}")
# Run a simple forward pass
tokens = model.to_tokens("Hello, world!")
logits, cache = model.run_with_cache(tokens)
print(f"Output shape: {logits.shape}")
print("✓ Everything works!")That’s it! You’re ready to start. The companion notebooks in this book all work in Colab.
2.3 Local Installation
For serious research, you’ll want a local setup.
2.3.1 Prerequisites
- Python 3.9+ (3.10 or 3.11 recommended)
- pip or conda
- (Optional) NVIDIA GPU with CUDA
2.3.2 Option A: Using pip (Recommended)
# Create a virtual environment
python -m venv interp-env
source interp-env/bin/activate # On Windows: interp-env\Scripts\activate
# Upgrade pip
pip install --upgrade pip
# Install PyTorch (choose the right version for your system)
# For CUDA 11.8:
pip install torch --index-url https://download.pytorch.org/whl/cu118
# For CUDA 12.1:
pip install torch --index-url https://download.pytorch.org/whl/cu121
# For CPU only:
pip install torch --index-url https://download.pytorch.org/whl/cpu
# Install interpretability libraries
pip install transformer-lens sae-lens einops jaxtyping circuitsvis plotly
# Install Jupyter
pip install jupyter ipywidgets2.3.3 Option B: Using conda
# Create environment
conda create -n interp python=3.11
conda activate interp
# Install PyTorch with CUDA
conda install pytorch pytorch-cuda=11.8 -c pytorch -c nvidia
# Or CPU only:
conda install pytorch cpuonly -c pytorch
# Install interpretability libraries
pip install transformer-lens sae-lens einops jaxtyping circuitsvis plotly
# Install Jupyter
conda install jupyter ipywidgets2.3.4 Verify Local Installation
import torch
import transformer_lens as tl
print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"TransformerLens: {tl.__version__}")
# Quick test
model = tl.HookedTransformer.from_pretrained("gpt2-small")
logits, cache = model.run_with_cache("Test")
print("✓ Installation successful!")2.4 Library Reference
2.4.1 Core Libraries
| Library | Purpose | Install |
|---|---|---|
| transformer-lens | Load models, access activations, run interventions | pip install transformer-lens |
| sae-lens | Train and use sparse autoencoders | pip install sae-lens |
| einops | Tensor operations with readable syntax | pip install einops |
| circuitsvis | Visualize attention patterns | pip install circuitsvis |
2.4.2 Optional Libraries
| Library | Purpose | Install |
|---|---|---|
| plotly | Interactive plots | pip install plotly |
| wandb | Experiment tracking | pip install wandb |
| nnsight | Alternative intervention library | pip install nnsight |
| baukit | Activation editing utilities | pip install baukit |
2.5 GPU Options
2.5.1 Free Options
| Option | GPU | VRAM | Time Limit | Notes |
|---|---|---|---|---|
| Google Colab | T4 | 15 GB | ~12 hrs/day | Best free option |
| Kaggle Notebooks | P100 / T4 | 16 GB | 30 hrs/week | Good alternative |
| Paperspace Gradient | Free tier | 8 GB | Limited | Requires signup |
2.5.2 Paid Options
| Option | GPUs Available | Cost | Notes |
|---|---|---|---|
| Colab Pro | T4, A100 | $10-50/mo | Most convenient |
| Lambda Labs | A100, H100 | ~$1-2/hr | Good for long runs |
| Vast.ai | Various | $0.20+/hr | Cheapest, less reliable |
| RunPod | Various | $0.30+/hr | Good balance |
2.5.3 Recommendation by Use Case
- Learning/tutorials: Free Colab is sufficient
- Running notebooks: Free Colab with T4
- Training SAEs: Paid GPU (A100 recommended)
- Large models (7B+): A100 40GB or better
2.6 Common Issues and Fixes
2.6.1 “CUDA out of memory”
Cause: Model or activations don’t fit in GPU memory.
Fixes:
# 1. Use a smaller model
model = tl.HookedTransformer.from_pretrained("gpt2-small") # Not "gpt2-xl"
# 2. Reduce batch size
tokens = model.to_tokens(text)[:, :512] # Limit sequence length
# 3. Clear cache between runs
torch.cuda.empty_cache()
# 4. Use gradient checkpointing for training
model.cfg.use_checkpoint = True2.6.2 “No module named ‘transformer_lens’”
Cause: Library not installed or wrong environment.
Fixes:
2.6.3 “Model not found” or download issues
Cause: Network issues or Hugging Face rate limiting.
Fixes:
2.6.4 Slow model loading
Cause: Downloading model weights on every run.
Fix: Models are cached after first download. If still slow:
2.6.5 CircuitsVis not rendering
Cause: JavaScript not loading in notebook.
Fixes:
2.7 IDE Setup
2.7.1 Jupyter Notebook/Lab
2.7.2 VS Code
- Install the Python extension
- Install the Jupyter extension
- Select your interpreter (the virtual environment you created)
- Open a
.ipynbfile or create a new notebook
Recommended VS Code settings for interpretability work:
2.7.3 PyCharm
- Configure the Python interpreter to use your virtual environment
- Enable Scientific Mode for better plot rendering
- Use the built-in Jupyter support
2.8 Testing Your Setup
Run this complete test to verify everything works:
"""
Complete setup verification script.
If this runs without errors, you're ready to go!
"""
import torch
import transformer_lens as tl
from transformer_lens import utils
print("=" * 50)
print("ENVIRONMENT CHECK")
print("=" * 50)
# PyTorch
print(f"\n✓ PyTorch {torch.__version__}")
# CUDA
if torch.cuda.is_available():
print(f"✓ CUDA available: {torch.cuda.get_device_name(0)}")
print(f" VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
else:
print("⚠ No GPU (CPU mode - will be slow for large models)")
# TransformerLens
print(f"✓ TransformerLens {tl.__version__}")
# Load model
print("\nLoading GPT-2 Small...")
model = tl.HookedTransformer.from_pretrained("gpt2-small")
print(f"✓ Model loaded: {model.cfg.n_layers} layers, {model.cfg.n_heads} heads")
# Forward pass
print("\nRunning forward pass...")
prompt = "The capital of France is"
tokens = model.to_tokens(prompt)
logits, cache = model.run_with_cache(tokens)
# Check prediction
next_token = logits[0, -1].argmax()
predicted = model.tokenizer.decode(next_token)
print(f"✓ '{prompt}' → '{predicted}'")
# Check cache
print(f"✓ Cache contains {len(cache)} activation tensors")
# Check SAELens (optional)
try:
import sae_lens
print(f"✓ SAELens {sae_lens.__version__}")
except ImportError:
print("⚠ SAELens not installed (optional)")
# Check CircuitsVis (optional)
try:
import circuitsvis
print("✓ CircuitsVis installed")
except ImportError:
print("⚠ CircuitsVis not installed (optional)")
print("\n" + "=" * 50)
print("SETUP COMPLETE - You're ready to go!")
print("=" * 50)2.9 Next Steps
Now that your environment is ready:
- Your First Analysis — Walk through a complete interpretability analysis
- Quick Reference — Code patterns and cheat sheets
- Chapter 2: Transformers — Understand the architecture
Happy interpreting!