Stage 9 Exercises¶

Conceptual Questions¶

Exercise 9.1: The Fine-tuning Problem¶

Consider a 7B parameter model that you want to fine-tune for a specific task.

a) How much GPU memory is needed for full fine-tuning in FP16? b) Why does fine-tuning all parameters often lead to worse generalization? c) What does "catastrophic forgetting" mean in this context?

Exercise 9.2: LoRA Mathematics¶

For LoRA with rank r=8 applied to a 4096x4096 weight matrix:

a) How many trainable parameters does LoRA add? b) What percentage is this of the original matrix? c) After training, can we recover a single weight matrix? How?

Exercise 9.3: Method Comparison¶

Compare LoRA, Adapters, and Prompt Tuning:

a) Which method adds parameters during inference? b) Which method is easiest to combine for multi-task models? c) Which method requires changing the model architecture?

Exercise 9.4: Rank Selection¶

Why is rank important in LoRA?

a) What happens if rank is too low? b) What happens if rank is too high? c) How would you choose rank for a new task?

Implementation Exercises¶

Exercise 9.5: LoRA Layer¶

Implement the core LoRA layer:

class LoRALayer:
    def __init__(self, in_features: int, out_features: int, rank: int = 8, alpha: float = 16):
        """
        Initialize LoRA layer.

        Args:
            in_features: Input dimension
            out_features: Output dimension
            rank: LoRA rank (r)
            alpha: Scaling factor
        """
        self.scaling = alpha / rank

        # TODO: Initialize A and B matrices
        # A: (rank, in_features) - Gaussian init
        # B: (out_features, rank) - Zero init
        pass

    def forward(self, x: np.ndarray, W: np.ndarray) -> np.ndarray:
        """
        Forward pass: Wx + scaling * (B @ A @ x)

        Args:
            x: Input [batch, seq, in_features]
            W: Frozen pretrained weights [out_features, in_features]
        """
        # TODO
        pass

    def merge(self, W: np.ndarray) -> np.ndarray:
        """Merge LoRA into base weights for inference."""
        # TODO
        pass

Exercise 9.6: LoRA Backward Pass¶

Implement gradients for LoRA:

def lora_backward(
    grad_output: np.ndarray,
    x: np.ndarray,
    A: np.ndarray,
    B: np.ndarray,
    scaling: float
) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:
    """
    Backward pass through LoRA.

    Args:
        grad_output: Gradient of loss w.r.t. output
        x: Input (cached from forward)
        A, B: LoRA matrices
        scaling: LoRA scaling factor

    Returns:
        (grad_A, grad_B, grad_x)
    """
    # TODO: Compute gradients
    # Note: W is frozen, no gradient needed
    pass

Exercise 9.7: Adapter Layer¶

Implement an adapter bottleneck layer:

class Adapter:
    def __init__(self, d_model: int, bottleneck: int = 64):
        """
        Adapter layer: down -> activation -> up -> residual

        Args:
            d_model: Model dimension
            bottleneck: Bottleneck dimension
        """
        # TODO: Initialize W_down, W_up
        pass

    def forward(self, x: np.ndarray) -> np.ndarray:
        """
        Forward: x + up(relu(down(x)))
        """
        # TODO
        pass

Exercise 9.8: Prompt Tuning¶

Implement soft prompt tuning:

class PromptTuning:
    def __init__(self, d_model: int, prompt_length: int = 20):
        """
        Initialize soft prompts.

        Args:
            d_model: Embedding dimension
            prompt_length: Number of soft prompt tokens
        """
        # TODO: Initialize learnable prompt embeddings
        pass

    def forward(self, input_embeddings: np.ndarray) -> np.ndarray:
        """
        Prepend soft prompts to input.

        Args:
            input_embeddings: [batch, seq, d_model]

        Returns:
            [batch, prompt_length + seq, d_model]
        """
        # TODO
        pass

Challenge Exercises¶

Exercise 9.9: LoRA for Multi-Head Attention¶

Apply LoRA to a full multi-head attention layer:

class LoRAMultiHeadAttention:
    def __init__(
        self,
        d_model: int,
        n_heads: int,
        rank: int = 8,
        target_modules: List[str] = ['q', 'v'],
    ):
        """
        Multi-head attention with LoRA on selected projections.

        Args:
            d_model: Model dimension
            n_heads: Number of heads
            rank: LoRA rank
            target_modules: Which projections to apply LoRA to
                           ('q', 'k', 'v', 'o')
        """
        # TODO: Initialize attention with LoRA on target modules
        pass

    def forward(self, x: np.ndarray, mask=None) -> np.ndarray:
        """Forward pass with LoRA-enhanced projections."""
        # TODO
        pass

    def merge_lora(self):
        """Merge all LoRA weights into base weights."""
        # TODO
        pass

    def count_trainable_params(self) -> int:
        """Count only LoRA parameters."""
        # TODO
        pass

Exercise 9.10: Parameter Efficiency Analysis¶

Build a tool to analyze PEFT efficiency:

def analyze_peft_efficiency(
    model_config: Dict[str, int],  # d_model, n_layers, n_heads
    method: str,  # 'lora', 'adapter', 'prefix', 'prompt'
    method_config: Dict[str, Any],  # rank, bottleneck, length, etc.
) -> Dict[str, Any]:
    """
    Analyze parameter efficiency of a PEFT method.

    Returns:
        - trainable_params: Number of trainable parameters
        - total_params: Total model parameters
        - efficiency_ratio: trainable / total
        - memory_estimate: Approximate memory savings
    """
    # TODO
    pass

Exercise 9.11: LoRA Merging and Switching¶

Implement LoRA weight management for multi-task:

class LoRAManager:
    def __init__(self, base_model):
        self.base_model = base_model
        self.adapters = {}  # task_name -> LoRA weights

    def add_adapter(self, name: str, lora_weights: Dict):
        """Register a new LoRA adapter."""
        # TODO
        pass

    def switch_adapter(self, name: str):
        """Switch to a different adapter."""
        # TODO
        pass

    def merge_adapter(self, name: str) -> None:
        """Permanently merge adapter into base weights."""
        # TODO
        pass

    def combine_adapters(self, names: List[str], weights: List[float]) -> None:
        """Linearly combine multiple adapters."""
        # TODO: Implement weighted average of adapters
        pass

Exercise 9.12: QLoRA Basics¶

Explore quantized LoRA concepts:

def quantize_to_4bit(weight: np.ndarray) -> Tuple[np.ndarray, float, float]:
    """
    Quantize weight to 4-bit representation.

    Returns:
        (quantized_weight, scale, zero_point)
    """
    # TODO: Implement simple 4-bit quantization
    pass

def dequantize_4bit(
    quantized: np.ndarray,
    scale: float,
    zero_point: float
) -> np.ndarray:
    """Dequantize 4-bit weight back to float."""
    # TODO
    pass

class QLoRALayer:
    """LoRA with 4-bit quantized base weights."""
    def __init__(self, weight_4bit, scale, zero_point, rank: int = 8):
        # TODO: Store quantized weights and LoRA in full precision
        pass

    def forward(self, x: np.ndarray) -> np.ndarray:
        # TODO: Dequantize on-the-fly and apply LoRA
        pass

Checking Your Work¶

Test suite: See code/stage-09/tests/test_peft.py for expected behavior
Reference implementation: Compare with code/stage-09/peft.py
Self-check: Verify LoRA merging produces same output as unmerged forward pass

Mini-Project: LoRA Fine-tuning¶

Implement LoRA and use it to fine-tune a model for a specific task.

Requirements¶

LoRA: Implement LoRA layers from scratch
Task: Fine-tune for a simple task (e.g., sentiment, simple QA)
Comparison: Compare parameter count and performance

Deliverables¶

[ ] LoRA implementation with merge capability
[ ] Fine-tuned model on chosen task
[ ] Comparison table: | Method | Params | Accuracy | |--------|--------|----------| | Full fine-tune | ? | ? | | LoRA r=4 | ? | ? | | LoRA r=16 | ? | ? |
[ ] Analysis of rank vs. performance

Extension¶

Implement QLoRA with 4-bit quantization of base weights.