EAGLE: From Token Guessing to Feature Guessing

The EAGLE family (SafeAILab, 2024--2025) took MEDUSA's idea and pushed it much further. Instead of predicting raw tokens, EAGLE's draft model predicted the hidden features of the target model's final layer. Features are richer than tokens. A token is a single word from a 128,000-word vocabulary. A feature is a thousand-dimensional vector that encodes not just which word is coming next, but the model's entire internal justification for choosing it.

Predicting features instead of tokens is like requiring the assistant to guess not just the memo's conclusion but the reasoning behind it. It's harder to train, but the guesses are much more accurate.

EAGLE-1 showed that feature-level drafting could match or exceed MEDUSA's speedups with a smaller draft model. EAGLE-2 added dynamic draft trees — instead of guessing a single straight line of future tokens, the draft model branched its guesses like a tree. If the model thought "the cat sat" was one possibility and "the dog ran" was another, it would propose both as branches. The verification pass could then accept whichever branch was correct.

Think of it like an executive who asks the assistant, "What are the two or three most likely things I'd sign off on here?" instead of "What's the single most likely thing?" Branched guessing captures more of the probability mass — and since exactly one branch is correct, the acceptance rate goes up.

But EAGLE-2 hit a ceiling. The feature-predicting approach forced the draft model to mimic the target model's exact intermediate representations. Like a student forced to solve a problem using exactly the teacher's method, the draft model's creativity was constrained. Its accuracy plateaued; more training data stopped helping.