Diffusion Language Models Know the Answer Before Decoding
Abstract
Prophet, a training-free fast decoding paradigm for diffusion language models, reduces inference time by leveraging early answer convergence without sacrificing quality.
Diffusion language models (DLMs) have recently emerged as an alternative to autoregressive approaches, offering parallel sequence generation and flexible token orders. However, their inference remains slower than that of autoregressive models, primarily due to the cost of bidirectional attention and the large number of refinement steps required for high quality outputs. In this work, we highlight and leverage an overlooked property of DLMs early answer convergence: in many cases, the correct answer can be internally identified by half steps before the final decoding step, both under semi-autoregressive and random remasking schedules. For example, on GSM8K and MMLU, up to 97% and 99% of instances, respectively, can be decoded correctly using only half of the refinement steps. Building on this observation, we introduce Prophet, a training-free fast decoding paradigm that enables early commit decoding. Specifically, Prophet dynamically decides whether to continue refinement or to go "all-in" (i.e., decode all remaining tokens in one step), using the confidence gap between the top-2 prediction candidates as the criterion. It integrates seamlessly into existing DLM implementations, incurs negligible overhead, and requires no additional training. Empirical evaluations of LLaDA-8B and Dream-7B across multiple tasks show that Prophet reduces the number of decoding steps by up to 3.4x while preserving high generation quality. These results recast DLM decoding as a problem of when to stop sampling, and demonstrate that early decode convergence provides a simple yet powerful mechanism for accelerating DLM inference, complementary to existing speedup techniques. Our code is publicly available at https://github.com/pixeli99/Prophet.
Community
In this paper, we identify a key overlooked property of diffusion language models (DLMs): early answer convergence, where correct answers emerge well before the final decoding step. Building on this insight, we propose Prophet, a training-free fast decoding paradigm that dynamically monitors the confidence gap and triggers early commit decoding. Our experiments on LLaDA-8B and Dream-7B show that Prophet achieves up to 3.4× speedup with negligible accuracy loss, offering a simple yet powerful solution for accelerating DLM inference.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing (2025)
- Wide-In, Narrow-Out: Revokable Decoding for Efficient and Effective DLLMs (2025)
- Beyond Fixed: Training-Free Variable-Length Denoising for Diffusion Large Language Models (2025)
- MDPO: Overcoming the Training-Inference Divide of Masked Diffusion Language Models (2025)
- Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models (2025)
- A Survey on Diffusion Language Models (2025)
- Sparse-dLLM: Accelerating Diffusion LLMs with Dynamic Cache Eviction (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper