Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning Paper • 2508.20751 • Published 6 days ago • 85
CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning Paper • 2508.20096 • Published 7 days ago • 35
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience Paper • 2508.04700 • Published 28 days ago • 51
Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models Paper • 2508.00819 • Published Aug 1 • 62
SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction Paper • 2507.15852 • Published Jul 21 • 38
view article Article SmolLM3: smol, multilingual, long-context reasoner By loubnabnl and 22 others • Jul 8 • 645
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs Paper • 2506.19290 • Published Jun 24 • 51
ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing Paper • 2506.19848 • Published Jun 24 • 26
LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs Paper • 2506.14429 • Published Jun 17 • 45
VideoRoPE: What Makes for Good Video Rotary Position Embeddi Collection A storage repo for VideoRoPE. • 6 items • Updated Jun 17 • 3
view article Article Introducing Pivotal Token Search (PTS): Targeting Critical Decision Points in LLM Training By codelion • May 17 • 9
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models Paper • 2406.13542 • Published Jun 19, 2024 • 17
MM-IFEngine Collection [ICCV 2025] Official Implementation of "MM-IFEngine: Towards Multimodal Instruction Following" • 2 items • Updated Jul 16 • 5
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model Paper • 2504.08685 • Published Apr 11 • 130
MM-IFEngine: Towards Multimodal Instruction Following Paper • 2504.07957 • Published Apr 10 • 34
Inference-Time Scaling for Generalist Reward Modeling Paper • 2504.02495 • Published Apr 3 • 57