Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2 • 180
Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper • 2505.03335 • Published May 6 • 184
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution Paper • 2411.02359 • Published Nov 4, 2024 • 13
LLM-based Optimization of Compound AI Systems: A Survey Paper • 2410.16392 • Published Oct 21, 2024 • 17
DiveR-CT: Diversity-enhanced Red Teaming with Relaxing Constraints Paper • 2405.19026 • Published May 29, 2024 • 8