Transductive Visual Programming: Evolving Tool Libraries from Experience for Spatial Reasoning

ICLR 2026 Conference Submission23479 Authors

Published: 26 Jan 2026, Last Modified: 26 Jan 2026ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: visual programming, spatial reasoning, tool abstraction
Abstract: The composition of specialized tools offers a powerful approach for complex visual reasoning, particularly for tasks involving 3D spatial understanding. However, existing visual programming methods are often constrained by fixed toolsets or offline tool induction, which leads to suboptimal solutions and poor tool reuse. We introduce Transductive Visual Programming (TVP), a novel framework that dynamically evolves a library of reusable tools by learning from its problem-solving experience. TVP abstracts recurring solution patterns into new, higher-level tools, which are then used to construct simpler and more effective programs for new tasks. On the challenging Omni3D-Bench, TVP establishes a new state of the art, outperforming both specialized vision-language models and prior visual programming systems. The evolved tools also exhibit strong generalization to out-of-domain queries on 3DSRBench, SpatialSense, and VGBench. Our work demonstrates that transductive tool evolution is a powerful and generalizable paradigm for building robust visual reasoning systems.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 23479
Loading