Abstract: In reinforcement learning (RL), spatial planning is often mediated through rasterized observations processed by convolutional networks, even when the underlying task is continuous and geometric. This discretization can introduce aliasing and obscure topological structure, increasing the difficulty of the spatial problem. We study a hierarchical set-valued geometry-first observation interface for sparse-reward navigation that operates directly on triangulated obstacle geometry. This interface uses learned multi-token aggregation to compress variable-sized geometry into a bounded fixed-size representation while preserving local spatial structure relevant for spatial decision making. In a controlled goal-conditioned point-navigation setting with a fixed RL backbone, we compare it against raster--CNN baselines across bounded and unbounded procedural training regimes. The empirical results of our work demonstrate that its advantage is most pronounced under continual exposure to newly generated environments, where the agent must learn reusable spatial structure rather than rely on memorizing a fixed environment support.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~William_T_Redman1
Submission Number: 8671
Loading