Every Trick, All at Once

Or: What happens when you unleash algorithmic parallelism on a single chip

We've now seen four families of techniques, each attacking a different part of the memory wall:

Speculative decoding amortizes memory trips across multiple tokens (Chapter 1)
Discrete diffusion converts the problem from memory-bound to compute-bound by generating blocks in parallel (Chapter 2)
Attention compression shrinks what needs to be stored and moved (Chapter 3)
Prefill optimization speeds up the input processing phase (Chapter 4)

The natural question: do these techniques conflict? Can you deploy speculative decoding AND compressed attention AND prompt compression all at once?

The answer, surprisingly, is yes — and they compound.