2  Environment Setup

Get your interpretability toolkit running in 10 minutes

This guide walks you through setting up everything you need to run the code in this book and start your own interpretability research.

2.1 Prerequisites Self-Assessment

Before diving in, check if you have the background needed. Don’t worry if you’re missing some—links are provided to fill gaps.

Can you read and understand this code?

# List comprehension
squares = [x**2 for x in range(10)]

# Dictionary operations
data = {"a": 1, "b": 2}
for key, value in data.items():
    print(f"{key}: {value}")

# Function with type hints
def greet(name: str) -> str:
    return f"Hello, {name}!"

If not: Work through Python for Everybody (free) or the first few chapters of any Python tutorial.

Can you understand this code?

import torch

# Create tensors
x = torch.randn(3, 4)  # Random 3x4 matrix
y = torch.zeros(4, 2)  # 4x2 matrix of zeros

# Matrix multiplication
z = x @ y  # Result is 3x2

# Indexing
first_row = x[0]      # Shape: [4]
first_col = x[:, 0]   # Shape: [3]

If not: Complete the PyTorch 60-Minute Blitz (free, ~1 hour).

Do you understand these concepts intuitively?

  • Vectors: A list of numbers representing a point/direction in space
  • Dot product: Measures how aligned two vectors are (cos similarity)
  • Matrix multiplication: Transforms vectors from one space to another
  • Orthogonality: Two vectors at 90° have dot product = 0

If not: Watch 3Blue1Brown’s Essence of Linear Algebra (free, ~3 hours total, but first 4 videos are enough).

Do you know what these are?

  • Attention: Mechanism for tokens to “look at” other tokens
  • MLP/FFN: Feed-forward layers that process each position independently
  • Residual stream: The main “highway” that information flows through
  • Embeddings: Converting tokens to vectors

If not: Don’t worry! Chapter 2 covers this from scratch. But if you want a head start, read The Illustrated Transformer (free, ~30 min).

Ready? If you can read Python and do basic PyTorch tensor operations, you have enough to start. The book explains everything else as needed.


2.3 Local Installation

For serious research, you’ll want a local setup.

2.3.1 Prerequisites

  • Python 3.9+ (3.10 or 3.11 recommended)
  • pip or conda
  • (Optional) NVIDIA GPU with CUDA

2.3.3 Option B: Using conda

# Create environment
conda create -n interp python=3.11
conda activate interp

# Install PyTorch with CUDA
conda install pytorch pytorch-cuda=11.8 -c pytorch -c nvidia

# Or CPU only:
conda install pytorch cpuonly -c pytorch

# Install interpretability libraries
pip install transformer-lens sae-lens einops jaxtyping circuitsvis plotly

# Install Jupyter
conda install jupyter ipywidgets

2.3.4 Verify Local Installation

import torch
import transformer_lens as tl

print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"TransformerLens: {tl.__version__}")

# Quick test
model = tl.HookedTransformer.from_pretrained("gpt2-small")
logits, cache = model.run_with_cache("Test")
print("✓ Installation successful!")

2.4 Library Reference

2.4.1 Core Libraries

Library Purpose Install
transformer-lens Load models, access activations, run interventions pip install transformer-lens
sae-lens Train and use sparse autoencoders pip install sae-lens
einops Tensor operations with readable syntax pip install einops
circuitsvis Visualize attention patterns pip install circuitsvis

2.4.2 Optional Libraries

Library Purpose Install
plotly Interactive plots pip install plotly
wandb Experiment tracking pip install wandb
nnsight Alternative intervention library pip install nnsight
baukit Activation editing utilities pip install baukit

2.5 GPU Options

2.5.1 Free Options

Option GPU VRAM Time Limit Notes
Google Colab T4 15 GB ~12 hrs/day Best free option
Kaggle Notebooks P100 / T4 16 GB 30 hrs/week Good alternative
Paperspace Gradient Free tier 8 GB Limited Requires signup

2.5.3 Recommendation by Use Case

  • Learning/tutorials: Free Colab is sufficient
  • Running notebooks: Free Colab with T4
  • Training SAEs: Paid GPU (A100 recommended)
  • Large models (7B+): A100 40GB or better

2.6 Common Issues and Fixes

2.6.1 “CUDA out of memory”

Cause: Model or activations don’t fit in GPU memory.

Fixes:

# 1. Use a smaller model
model = tl.HookedTransformer.from_pretrained("gpt2-small")  # Not "gpt2-xl"

# 2. Reduce batch size
tokens = model.to_tokens(text)[:, :512]  # Limit sequence length

# 3. Clear cache between runs
torch.cuda.empty_cache()

# 4. Use gradient checkpointing for training
model.cfg.use_checkpoint = True

2.6.2 “No module named ‘transformer_lens’”

Cause: Library not installed or wrong environment.

Fixes:

# Make sure you're in the right environment
which python  # Should show your virtual environment

# Reinstall
pip install --upgrade transformer-lens

2.6.3 “Model not found” or download issues

Cause: Network issues or Hugging Face rate limiting.

Fixes:

# Set cache directory
import os
os.environ["HF_HOME"] = "/path/to/cache"

# Or download manually and load from disk
model = tl.HookedTransformer.from_pretrained(
    "gpt2-small",
    cache_dir="/path/to/models"
)

2.6.4 Slow model loading

Cause: Downloading model weights on every run.

Fix: Models are cached after first download. If still slow:

# Check cache location
import transformer_lens
print(transformer_lens.utils.get_cache_dir())

# Set persistent cache
import os
os.environ["TRANSFORMERLENS_CACHE_DIR"] = "/path/to/persistent/cache"

2.6.5 CircuitsVis not rendering

Cause: JavaScript not loading in notebook.

Fixes:

# In Colab, enable widgets
from google.colab import output
output.enable_custom_widget_manager()

# Or use HTML rendering
import circuitsvis as cv
html = cv.attention.attention_patterns(tokens, patterns)
from IPython.display import HTML
display(HTML(html._repr_html_()))

2.7 IDE Setup

2.7.1 Jupyter Notebook/Lab

# Install
pip install jupyter jupyterlab

# Run Jupyter Lab (recommended)
jupyter lab

# Or classic notebook
jupyter notebook

2.7.2 VS Code

  1. Install the Python extension
  2. Install the Jupyter extension
  3. Select your interpreter (the virtual environment you created)
  4. Open a .ipynb file or create a new notebook

Recommended VS Code settings for interpretability work:

{
    "python.analysis.typeCheckingMode": "off",
    "jupyter.askForKernelRestart": false,
    "editor.formatOnSave": true
}

2.7.3 PyCharm

  1. Configure the Python interpreter to use your virtual environment
  2. Enable Scientific Mode for better plot rendering
  3. Use the built-in Jupyter support

2.8 Testing Your Setup

Run this complete test to verify everything works:

"""
Complete setup verification script.
If this runs without errors, you're ready to go!
"""

import torch
import transformer_lens as tl
from transformer_lens import utils

print("=" * 50)
print("ENVIRONMENT CHECK")
print("=" * 50)

# PyTorch
print(f"\n✓ PyTorch {torch.__version__}")

# CUDA
if torch.cuda.is_available():
    print(f"✓ CUDA available: {torch.cuda.get_device_name(0)}")
    print(f"  VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
else:
    print("⚠ No GPU (CPU mode - will be slow for large models)")

# TransformerLens
print(f"✓ TransformerLens {tl.__version__}")

# Load model
print("\nLoading GPT-2 Small...")
model = tl.HookedTransformer.from_pretrained("gpt2-small")
print(f"✓ Model loaded: {model.cfg.n_layers} layers, {model.cfg.n_heads} heads")

# Forward pass
print("\nRunning forward pass...")
prompt = "The capital of France is"
tokens = model.to_tokens(prompt)
logits, cache = model.run_with_cache(tokens)

# Check prediction
next_token = logits[0, -1].argmax()
predicted = model.tokenizer.decode(next_token)
print(f"✓ '{prompt}' → '{predicted}'")

# Check cache
print(f"✓ Cache contains {len(cache)} activation tensors")

# Check SAELens (optional)
try:
    import sae_lens
    print(f"✓ SAELens {sae_lens.__version__}")
except ImportError:
    print("⚠ SAELens not installed (optional)")

# Check CircuitsVis (optional)
try:
    import circuitsvis
    print("✓ CircuitsVis installed")
except ImportError:
    print("⚠ CircuitsVis not installed (optional)")

print("\n" + "=" * 50)
print("SETUP COMPLETE - You're ready to go!")
print("=" * 50)

2.9 Next Steps

Now that your environment is ready:

  1. Your First Analysis — Walk through a complete interpretability analysis
  2. Quick Reference — Code patterns and cheat sheets
  3. Chapter 2: Transformers — Understand the architecture

Happy interpreting!