R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Annealing and Reinforce Learning Paper • 2508.21113 • Published 5 days ago • 86
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning Paper • 2508.16949 • Published 10 days ago • 21
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs Paper • 2508.16153 • Published 11 days ago • 121
NVIDIA Nemotron Collection Open, Production-ready Enterprise Models. Nvidia Open Model license. • 3 items • Updated 4 days ago • 55
Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models Paper • 2508.10751 • Published 19 days ago • 26
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published 26 days ago • 168