Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning Paper • 2503.15558 • Published Mar 18 • 51
RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints Paper • 2503.16408 • Published Mar 20 • 41
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy Paper • 2503.19757 • Published Mar 25 • 52
PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos Paper • 2503.17973 • Published Mar 23 • 8
KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation Paper • 2503.10546 • Published Mar 13 • 3
Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills Paper • 2503.12533 • Published Mar 16 • 69
WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes Paper • 2503.13435 • Published Mar 17 • 18
ManipTrans: Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning Paper • 2503.21860 • Published Mar 27 • 5
CaRL: Learning Scalable Planning Policies with Simple Rewards Paper • 2504.17838 • Published Apr 24 • 3
R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning Paper • 2505.02835 • Published May 5 • 27
Interactive Post-Training for Vision-Language-Action Models Paper • 2505.17016 • Published May 22 • 6
ScanBot: Towards Intelligent Surface Scanning in Embodied Robotic Systems Paper • 2505.17295 • Published May 22 • 9
BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models Paper • 2506.07961 • Published Jun 9 • 12
EmbodiedGen: Towards a Generative 3D World Engine for Embodied Intelligence Paper • 2506.10600 • Published Jun 12 • 7
villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models Paper • 2507.23682 • Published 5 days ago • 21