Perf Bits

A collection of articles exploring software performance optimization techniques across different languages and paradigms.

Articles

Involutions: The License to Update in O(1)

Why Zobrist hashing updates a chessboard in constant time, and the property that makes incremental computation possible.

Beyond Snowflake: A Problem-Solver's Guide to Distributed ID Generation

How memory allocators teach us to generate unique IDs at scale—and how to discover solutions, not just memorize them.

Quantization: The License to Approximate

Why 4 bits can do the work of 32, and the mathematical reason neural networks tolerate imprecision.

Compositionality: The Power of Depth

Why stacking layers creates exponential expressiveness.

Stochasticity: The Regularizer in Disguise

Why noise is not the enemy of learning—it's the secret ingredient.

Locality: The License to Focus

Why 3×3 kernels beat global attention, and the assumption that makes efficient ML possible.

Smoothness: The License to Go Deep

Why ResNets train but plain networks don't, and the property that unlocked modern deep learning.

Symmetry: The Property That Designs Architectures

Why CNNs share weights, why GNNs aggregate neighbors, and why AlphaFold predicts proteins.

Separability: The Art of Factorization

Why MobileNet is 12x faster than ResNet, and how LoRA fine-tunes GPT-3 with 10,000x fewer parameters.

Sparsity: The License to Skip

Why ignoring most of your neural network is the key to efficiency.

Domain Transformations: The Art of Finding Easier Spaces

Why logarithms prevent underflow, why Fourier speeds up convolutions, and how choosing the right space makes hard problems tractable.

Linearity: Why Batching Works

And the property that makes neural network training computationally tractable.

Commutativity: Why Transformers Need Positional Encodings

And other consequences of order not mattering in ML architectures.

The One Property That Makes FlashAttention Possible

Associativity is the license to parallelize, chunk, and stream.

Virtual Functions Strike Again

Or too much dynamism is a thing. Exploring virtual dispatch overhead and the -fstrict-vtable-pointers optimization.

sprintf vs std::to_chars

Or the cost of hidden dependencies. Comparing numeric-to-string conversion performance.

Generating Random Integers in Python

Or don't make assumptions about performance based on API.

Speeding up Julia's searchsortedfirst

Or even standard libraries can be improved.

Faster toupper Implementation

Or binary trickery for the win.

Faster Fibonacci

Or the hidden power of semigroups.

Taras Tsugrii