Submitted by foggyforest 184 Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models · 22 authors 449 3
Submitted by scofield7419 83 On Path to Multimodal Generalist: General-Level and General-Bench · 32 authors 19 9
Submitted by akhaliq 28 Generating Physically Stable and Buildable LEGO Designs from Text · 6 authors 1.3k 2
Submitted by vvibt 28 Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models · 13 authors 90 4
Submitted by shengz 15 X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains · 12 authors 3
Submitted by WHB139426 14 StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant · 9 authors 2
Submitted by Samir55 13 PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes · 9 authors 23 2
Submitted by arianhosseini 12 Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers · 5 authors 3
Submitted by RanjanSapkota 8 Vision-Language-Action Models: Concepts, Progress, Applications and Challenges · 4 authors 2
Submitted by dogtooth 7 SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning · 2 authors 2
Submitted by PALIN2018 4 BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in Chinese · 16 authors 87 2