ObjectReact: Learning Object-Relative Control for Visual Navigation
Abstract
A new object-relative control paradigm using a topometric map representation and a local controller achieves better invariance and generalization in visual navigation tasks compared to image-relative methods.
Visual navigation using only a single camera and a topological map has recently become an appealing alternative to methods that require additional sensors and 3D maps. This is typically achieved through an "image-relative" approach to estimating control from a given pair of current observation and subgoal image. However, image-level representations of the world have limitations because images are strictly tied to the agent's pose and embodiment. In contrast, objects, being a property of the map, offer an embodiment- and trajectory-invariant world representation. In this work, we present a new paradigm of learning "object-relative" control that exhibits several desirable characteristics: a) new routes can be traversed without strictly requiring to imitate prior experience, b) the control prediction problem can be decoupled from solving the image matching problem, and c) high invariance can be achieved in cross-embodiment deployment for variations across both training-testing and mapping-execution settings. We propose a topometric map representation in the form of a "relative" 3D scene graph, which is used to obtain more informative object-level global path planning costs. We train a local controller, dubbed "ObjectReact", conditioned directly on a high-level "WayObject Costmap" representation that eliminates the need for an explicit RGB input. We demonstrate the advantages of learning object-relative control over its image-relative counterpart across sensor height variations and multiple navigation tasks that challenge the underlying spatial understanding capability, e.g., navigating a map trajectory in the reverse direction. We further show that our sim-only policy is able to generalize well to real-world indoor environments. Code and supplementary material are accessible via project page: https://object-react.github.io/
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- PixelNav: Towards Model-based Vision-Only Navigation with Topological Graphs (2025)
- OVSegDT: Segmenting Transformer for Open-Vocabulary Object Goal Navigation (2025)
- MAG-Nav: Language-Driven Object Navigation Leveraging Memory-Reserved Active Grounding (2025)
- Distilling LLM Prior to Flow Model for Generalizable Agent's Imagination in Object Goal Navigation (2025)
- ActLoc: Learning to Localize on the Move via Active Viewpoint Selection (2025)
- Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities (2025)
- SkeNa: Learning to Navigate Unseen Environments Based on Abstract Hand-Drawn Maps (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper