The Sculptor vs. The Potter
The best analogy for understanding diffusion comes from art, not engineering.
Autoregressive language models are like potters working at a wheel. They start with a lump of clay and add material one pinch at a time, building the pot incrementally. Each addition depends on what came before. The potter can't work on the rim until the base is formed, and can't work on the handle until the body is complete. If the potter makes a mistake — a wobble, a crack — they either live with it or scrap the whole piece and start over.
Diffusion models are like sculptors working in marble. They start with a solid block — an undifferentiated mass. The sculpture is already "in there" in some sense; the sculptor's job is to reveal it by removing everything that doesn't belong. A sculptor can work on the whole block at once — chipping away at the top, the sides, the front, the back simultaneously. At any point in the process, the entire form is present in rough outline; it's just a matter of how much detail has been resolved.
This is the fundamental difference. Potters add. Sculptors reveal. And revealing from a whole block can be done in parallel — each region refined independently — while adding one pinch of clay at a time is inherently sequential.