Spatial Policy: Guiding Visuomotor Robotic Manipulation with Spatial-Aware Modeling and Reasoning

1Tsinghua University 2Beijing University of Technology 3Guangzhou University

Abstract

Vision-centric hierarchical embodied models have demonstrated strong potential for long-horizon robotic control. However, existing methods lack spatial awareness capabilities, limiting their effectiveness in bridging visual plans to actionable control in complex environments.

To address this problem, we propose Spatial Policy (SP), a unified spatial-aware visuomotor robotic manipulation framework via explicit spatial modeling and reasoning. Specifically, we first design a spatial-conditioned embodied video generation module to model spatially guided predictions through a spatial plan table. Then, we propose a spatial-based action prediction module to infer executable actions with coordination. Finally, we propose a spatial reasoning feedback policy to refine the spatial plan table via dual-stage replanning.

Extensive experiments show that SP significantly outperforms state-of-the-art baselines, achieving a 33.0% average improvement over the best baseline. With an 86.7% average success rate across 11 diverse tasks, SP substantially enhances the practicality of embodied models for robotic control applications.

Keywords: robotic manipulation, spatial-aware generation, spatial-aware reasoning, embodied AI, diffusion models

Spatial Policy method overview

Overview of the Spatial Policy framework for spatial-aware visuomotor robotic manipulation.

Embodied Video Generation Result

Corner

Corner2

Corner3

assembly
basketball
button-press
button-press-topdown
door-close
door-open
faucet-close
faucet-open
hammer
handle-press
shelf-place
-->

BibTeX

@article{liu2025spatialpolicy,
  author    = {Liu, Yijun and Liu, Yuwei and Meng, Yuan and Zhang, JieHeng and Zhou, Yuwei and Li, Ye and Jiang, Jiacheng and Ji, Kangye and Ge, Shijia and Wang, Zhi and Zhu, Wenwu},
  title     = {Spatial Policy: Guiding Visuomotor Robotic Manipulation with Spatial-Aware Modeling and Reasoning},
  journal   = {arXiv preprint arXiv:2508.15874},
  year      = {2025},
  eprint    = {2508.15874},
  archivePrefix = {arXiv},
  primaryClass = {cs.RO}
}