-
Yume: An Interactive World Generation Model
Paper • 2507.17744 • Published • 85 -
SSRL: Self-Search Reinforcement Learning
Paper • 2508.10874 • Published • 93 -
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
Paper • 2506.06941 • Published • 15 -
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 181
Collections
Discover the best community collections!
Collections including paper arxiv:2506.01939
-
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 181 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 136 -
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
Paper • 2505.22617 • Published • 130
-
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 260 -
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper • 2506.06395 • Published • 130 -
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Paper • 2506.05176 • Published • 72 -
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 272
-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 272 -
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 181 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 136 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97
-
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning
Paper • 2506.07044 • Published • 113 -
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
Paper • 2506.09513 • Published • 98 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper • 2506.09113 • Published • 101
-
Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA
Paper • 2505.21115 • Published • 139 -
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 181 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 136
-
Yume: An Interactive World Generation Model
Paper • 2507.17744 • Published • 85 -
SSRL: Self-Search Reinforcement Learning
Paper • 2508.10874 • Published • 93 -
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity
Paper • 2506.06941 • Published • 15 -
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 181
-
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning
Paper • 2506.07044 • Published • 113 -
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
Paper • 2506.09513 • Published • 98 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
Seedance 1.0: Exploring the Boundaries of Video Generation Models
Paper • 2506.09113 • Published • 101
-
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 181 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 136 -
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
Paper • 2505.22617 • Published • 130
-
Will It Still Be True Tomorrow? Multilingual Evergreen Question Classification to Improve Trustworthy QA
Paper • 2505.21115 • Published • 139 -
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 181 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 136
-
Reinforcement Pre-Training
Paper • 2506.08007 • Published • 260 -
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper • 2506.06395 • Published • 130 -
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Paper • 2506.05176 • Published • 72 -
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 272
-
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper • 2505.24726 • Published • 272 -
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Paper • 2506.01939 • Published • 181 -
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 136 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97