Perf Bits
A collection of articles exploring software performance optimization techniques across different languages and paradigms.
Articles
Involutions: The License to Update in O(1)
Why Zobrist hashing updates a chessboard in constant time, and the property that makes incremental computation possible.
Beyond Snowflake: A Problem-Solver's Guide to Distributed ID Generation
How memory allocators teach us to generate unique IDs at scale—and how to discover solutions, not just memorize them.
Quantization: The License to Approximate
Why 4 bits can do the work of 32, and the mathematical reason neural networks tolerate imprecision.
Compositionality: The Power of Depth
Why stacking layers creates exponential expressiveness.
Stochasticity: The Regularizer in Disguise
Why noise is not the enemy of learning—it's the secret ingredient.
Locality: The License to Focus
Why 3×3 kernels beat global attention, and the assumption that makes efficient ML possible.
Smoothness: The License to Go Deep
Why ResNets train but plain networks don't, and the property that unlocked modern deep learning.
Symmetry: The Property That Designs Architectures
Why CNNs share weights, why GNNs aggregate neighbors, and why AlphaFold predicts proteins.
Separability: The Art of Factorization
Why MobileNet is 12x faster than ResNet, and how LoRA fine-tunes GPT-3 with 10,000x fewer parameters.
Sparsity: The License to Skip
Why ignoring most of your neural network is the key to efficiency.
Domain Transformations: The Art of Finding Easier Spaces
Why logarithms prevent underflow, why Fourier speeds up convolutions, and how choosing the right space makes hard problems tractable.
Linearity: Why Batching Works
And the property that makes neural network training computationally tractable.
Commutativity: Why Transformers Need Positional Encodings
And other consequences of order not mattering in ML architectures.
The One Property That Makes FlashAttention Possible
Associativity is the license to parallelize, chunk, and stream.
Virtual Functions Strike Again
Or too much dynamism is a thing. Exploring virtual dispatch overhead and the -fstrict-vtable-pointers optimization.
sprintf vs std::to_chars
Or the cost of hidden dependencies. Comparing numeric-to-string conversion performance.
Generating Random Integers in Python
Or don't make assumptions about performance based on API.
Speeding up Julia's searchsortedfirst
Or even standard libraries can be improved.
Faster toupper Implementation
Or binary trickery for the win.
Faster Fibonacci
Or the hidden power of semigroups.