Further Reading & Resources¶

Curated resources for going deeper

This page collects the most valuable external resources for each topic covered in this book.

Papers¶

Foundational Papers¶

Paper	Year	Key Contribution	Stage
Attention Is All You Need	2017	The transformer architecture	5, 6
Neural Probabilistic Language Model	2003	Word embeddings for language modeling	3
Adam: A Method for Stochastic Optimization	2014	The Adam optimizer	4
Layer Normalization	2016	LayerNorm for transformers	6
Deep Residual Learning	2015	Residual connections	6

Tokenization Papers¶

Paper	Year	Key Contribution	Stage
Neural Machine Translation of Rare Words with Subword Units	2016	BPE for NLP	7
Google's Neural Machine Translation System	2016	WordPiece	7
SentencePiece	2018	Unigram tokenization	7

PEFT Papers¶

Paper	Year	Key Contribution	Stage
LoRA: Low-Rank Adaptation of Large Language Models	2021	Low-rank fine-tuning	9
Parameter-Efficient Transfer Learning for NLP	2019	Adapter layers	9
Prefix-Tuning	2021	Soft prefixes	9
The Power of Scale for Parameter-Efficient Prompt Tuning	2021	Prompt tuning	9
QLoRA: Efficient Finetuning of Quantized LLMs	2023	4-bit fine-tuning	9

Alignment Papers¶

Paper	Year	Key Contribution	Stage
Training Language Models to Follow Instructions with Human Feedback	2022	InstructGPT, RLHF	10
Direct Preference Optimization	2023	DPO	10
Constitutional AI	2022	Self-critique	10
Proximal Policy Optimization Algorithms	2017	PPO	10

Scaling & Architecture Papers¶

Paper	Year	Key Contribution	Stage
Scaling Laws for Neural Language Models	2020	Chinchilla scaling	6
LLaMA: Open Foundation and Fine-Tuned Chat Models	2023	Modern architecture	6
GPT-2: Language Models are Unsupervised Multitask Learners	2019	GPT-2	6
RoFormer: Enhanced Transformer with Rotary Position Embedding	2021	RoPE	5

Libraries & Tools¶

Essential Libraries¶

Library	Purpose	Relevant Stages
NumPy	Array operations (used throughout this book)	All
PyTorch	Production deep learning	All
JAX	Autodiff and accelerators	2
Hugging Face Transformers	Pre-trained models	6, 9
PEFT	LoRA and adapters	9
TRL	RLHF and DPO	10

Tokenization Libraries¶

Library	Purpose
tiktoken	OpenAI's BPE tokenizer
SentencePiece	Unigram and BPE
tokenizers	Fast tokenization

Training Tools¶

Library	Purpose
Weights & Biases	Experiment tracking
TensorBoard	Training visualization
DeepSpeed	Distributed training
Accelerate	Multi-GPU training

Books¶

Machine Learning Foundations¶

Book	Author(s)	Focus
Deep Learning	Goodfellow, Bengio, Courville	Comprehensive ML theory
Pattern Recognition and Machine Learning	Bishop	Probabilistic ML
The Elements of Statistical Learning	Hastie, Tibshirani, Friedman	Statistical methods

NLP & Language Models¶

Book	Author(s)	Focus
Speech and Language Processing	Jurafsky & Martin	NLP foundations
Natural Language Processing with Transformers	Tunstall, von Werra, Wolf	Practical transformers
Dive into Deep Learning	Zhang et al.	Interactive ML book

Courses¶

Course	Institution	Focus
CS231n	Stanford	CNNs, backprop basics
CS224n	Stanford	NLP with deep learning
CS324	Stanford	Large language models
fast.ai	fast.ai	Practical deep learning

Blog Posts & Tutorials¶

Codebases to Study¶

Educational Implementations¶

Repo	Author	What to Learn
nanoGPT	Karpathy	Minimal GPT training
minGPT	Karpathy	Simple GPT implementation
micrograd	Karpathy	Tiny autograd engine
llm.c	Karpathy	GPT in C

Production Implementations¶

Repo	What to Learn
llama	Production transformer
transformers	Library architecture
vLLM	Inference optimization

Datasets¶

Language Modeling¶

Dataset	Size	Use Case
TinyStories	Small	Learning, debugging
OpenWebText	Medium	GPT-2 reproduction
The Pile	Large	Serious pre-training
RedPajama	Large	LLaMA reproduction

Alignment¶

Dataset	Purpose
Anthropic HH-RLHF	Preference data
OpenAssistant	Conversation data
Alpaca	Instruction data

Further Reading & Resources¶

Papers¶

Foundational Papers¶

Tokenization Papers¶

PEFT Papers¶

Alignment Papers¶

Scaling & Architecture Papers¶

Libraries & Tools¶

Essential Libraries¶

Tokenization Libraries¶

Training Tools¶

Books¶

Machine Learning Foundations¶

NLP & Language Models¶

Courses¶

Blog Posts & Tutorials¶

Understanding Transformers¶

Understanding Training¶

Understanding Alignment¶

Codebases to Study¶

Educational Implementations¶

Production Implementations¶

Datasets¶

Language Modeling¶

Alignment¶

Communities¶

Staying Current¶

Research Feeds¶

Newsletters¶