ThinkSwitcher: When to Think Hard, When to Think Fast Paper β’ 2505.14183 β’ Published May 20 β’ 1
FuseRL: Dense Preference Optimization for Heterogeneous Model Fusion Paper β’ 2504.06562 β’ Published Apr 9
WebSailor: Navigating Super-human Reasoning for Web Agent Paper β’ 2507.02592 β’ Published Jul 3 β’ 113
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning Paper β’ 2505.17667 β’ Published May 23 β’ 89
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning Paper β’ 2505.17667 β’ Published May 23 β’ 89
Advantage-Guided Distillation for Preference Alignment in Small Language Models Paper β’ 2502.17927 β’ Published Feb 25 β’ 1
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning Paper β’ 2505.17667 β’ Published May 23 β’ 89
QwenLong-CPRS: Towards $\infty$-LLMs with Dynamic Context Optimization Paper β’ 2505.18092 β’ Published May 23 β’ 44
FuseChat-3.0: Preference Optimization Meets Heterogeneous Model Fusion Paper β’ 2503.04222 β’ Published Mar 6 β’ 15
FuseChat-3.0: Preference Optimization Meets Heterogeneous Model Fusion Paper β’ 2503.04222 β’ Published Mar 6 β’ 15
FuseChat-3.0: Preference Optimization Meets Heterogeneous Model Fusion Paper β’ 2503.04222 β’ Published Mar 6 β’ 15