Papers
arxiv:2507.14958

MUR: Momentum Uncertainty guided Reasoning for Large Language Models

Published on Jul 20
· Submitted by xufangzhi on Jul 25
#3 Paper of the day
Authors:
,
,
,
,
,
,
,
,
,
,

Abstract

Momentum Uncertainty-guided Reasoning (MUR) dynamically allocates computational resources to improve reasoning efficiency and accuracy in Large Language Models without additional training.

AI-generated summary

Large Language Models (LLMs) have achieved impressive performance on reasoning-intensive tasks, yet optimizing their reasoning efficiency remains an open challenge. While Test-Time Scaling (TTS) improves reasoning quality, it often leads to overthinking, wasting tokens on redundant computations. This work investigates how to efficiently and adaptively guide LLM test-time scaling without additional training. Inspired by the concept of momentum in physics, we propose Momentum Uncertainty-guided Reasoning (MUR), which dynamically allocates thinking budgets to critical reasoning steps by tracking and aggregating stepwise uncertainty over time. To support flexible inference-time control, we introduce gamma-control, a simple mechanism that tunes the reasoning budget via a single hyperparameter. We provide in-depth theoretical proof to support the superiority of MUR in terms of stability and biases. MUR is comprehensively evaluated against various TTS methods across four challenging benchmarks (MATH-500, AIME24, AIME25, and GPQA-diamond) using different sizes of recent Qwen3 models (1.7B, 4B, and 8B). Results demonstrate that MUR reduces computation by over 50% on average while improving accuracy by 0.62-3.37%.

Community

Paper submitter

Large Language Models (LLMs) have achieved impressive performance on reasoning-intensive tasks, yet optimizing their reasoning efficiency remains an open challenge. While Test-Time Scaling (TTS) improves reasoning quality, it often leads to overthinking, wasting tokens on redundant computations. This work investigates how to efficiently and adaptively guide LLM test-time scaling without additional training. Inspired by the concept of momentum in physics, we propose Momentum Uncertainty-guided Reasoning (MUR), which dynamically allocates thinking budgets to critical reasoning steps by tracking and aggregating stepwise uncertainty over time. To support flexible inference-time control, we introduce gamma-control, a simple mechanism that tunes the reasoning budget via a single hyperparameter. We provide in-depth theoretical proof to support the superiority of MUR in terms of stability and biases. MUR is comprehensively evaluated against various TTS methods across four challenging benchmarks (MATH-500, AIME24, AIME25, and GPQA-diamond) using different sizes of recent Qwen3 models (1.7B, 4B, and 8B). Results demonstrate that MUR reduces computation by over 50% on average while improving accuracy by 0.62-3.37%.

🙏 Clarification Questions

Hello, I’m not deeply familiar with this research area, so I may have misunderstood certain points. I appreciate your help in clarifying the following:

  1. Performance of MUR vs. Per‑Step Scale

Why does MUR outperform “Per‑Step Scale,” even though Per‑Step Scale applies full scaling on every step?In Figure 4, the dashed line representing Per‑Step Scale accuracy (i.e., an upper‑bound baseline) falls below the MUR curve.Did you analyze reasons for this phenomenon?For example, is it possible that MUR can scale multiple times per step, or is Per‑Step Scale strictly scaling only once per step?

  1. Number of Reasoning Steps: MUR vs. CoT vs. Per‑Step Scale

MUR appears to use fewer average reasoning steps than standard CoT, and even fewer in the case of Per‑Step Scale(Figure 5) Why?  and I believe "Per-Step Scale Accuracy" in Fig 5 is a typo of "Per-Step Scale"?

How are reasoning steps defined and segmented in your experiments?

Thank you very much for your time and assistance—I really appreciate your help in understanding these points.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2507.14958 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.14958 in a Space README.md to link it from this page.

Collections including this paper 8