Ariadne: Advancing Real-world Path-finding Capabilities of VLMs via Difficulty-aware Reinforcement Learning

Minghe Shen; Zhuo Zhi; Chonghan Liu; Shuo Xing; Zhengzhong Tu; Che Liu

Ariadne: Advancing Real-world Path-finding Capabilities of VLMs via Difficulty-aware Reinforcement Learning

Minghe Shen, Zhuo Zhi, Chonghan Liu, Shuo Xing, Zhengzhong Tu, Che Liu

03 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Reinforcement Learning, Vision–Language Model, GRPO

Abstract: Recent advances have seen Vision-Language Models (VLMs) achieve impressive reasoning capabilities, largely demonstrated on tasks like mathematical problem solving via reinforcement learning. However, whether such methods can extend the fundamental reasoning bounds of VLMs to out-of-distribution complexities remains an underexplored question, as the cumulative and interconnected nature of knowledge in domains like mathematics makes it difficult to create truly isolated training and testing splits. To address this, we investigate multistep spatial reasoning, a domain where task difficulty can be systematically controlled. We introduce Ariadne, a training and evaluation framework centered on pathfinding puzzles where complexity is precisely defined by path length and turn count. This allows us to train on a curriculum of simpler puzzles and evaluate generalization on quantifiably harder, unseen tasks (e.g., training on paths with $\le$3 steps and testing on paths with $\ge$5 steps). Our experiments reveal that while a strong base model like Qwen-VL-7B-Instruct fails on paths longer than two steps, our model, trained with RLVR, successfully generalizes to solving five-step puzzles unseen during training. This result demonstrates that reinforcement learning can genuinely extend the intrinsic reasoning capabilities of VLMs. Surprisingly, although trained exclusively on synthetic mazes, Ariadne demonstrates performance gains on real-world benchmarks like MapBench and ReasonMap, showcasing that core spatial reasoning skills transfer effectively even when the visual inputs, from simple mazes to complex real-world maps, are entirely distinct.

Primary Area: reinforcement learning

Submission Number: 1312

Loading