The Common Thread

Every variant of speculative decoding — classic, MEDUSA, EAGLE-1/2/3, MTP — exploits the same insight: the bottleneck is memory bandwidth, not compute.

Each verification pass loads the model from memory once and checks multiple tokens. You pay for one memory trip and get multiple tokens validated. The numbers vary, but the economics are universal:

Vanilla decoding: 1 token per memory load
Classic speculative decoding (50% acceptance): ~2 tokens per load
MTP (85% acceptance): ~2.8 tokens per load
EAGLE-3: 6--7 tokens per load

Each improvement doubles or triples the effective utilization of the GPU's memory subsystem. And because the verifier guarantees correctness, every token is exactly what the original model would have produced.

It's free performance. Or, more accurately, it's paid for in algorithmic cleverness rather than silicon.