Learning Sound Symbolic Abstractions from VLMs for Efficient Task and Motion Planning on CALVIN

Bayron Jossue Serrano Mena

Learning Sound Symbolic Abstractions from VLMs for Efficient Task and Motion Planning on CALVIN

Bayron Jossue Serrano Mena

Published: 01 Feb 2026, Last Modified: 01 Feb 2026CoRL 2025 Workshop LEAP (Rolling)EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Symbolic Abstraction, Task and Motion Planning, Vision-Language Models, Neuro-Symbolic AI, Long-Horizon Planning, Robot Learning, CALVIN Benchmark

TL;DR: This paper introduces a method to extract sound symbolic representations from Vision-Language Models to enable robust Task and Motion Planning for long-horizon robotics tasks, outperforming end-to-end and open-loop baselines.

Abstract: Complex long-horizon robotics tasks require combining high-level reasoning with low-level control. This paper presents a method to bridge this gap by learning sound symbolic abstractions from Vision-Language Models (VLMs) to enable robust Task and Motion Planning (TAMP). We propose a pipeline that converts the continuous confidence scores of a pre-trained VLM into discrete, verifiable symbolic predicates through confidence thresholding and temporal filtering. Our approach is evaluated on the challenging CALVIN benchmark, where it outperforms end-to-end and open-loop baselines, achieving a 68.5\% task success rate. We provide a detailed analysis of the soundness-completeness trade-off inherent in learning abstractions and demonstrate the superiority of our closed-loop, neuro-symbolic architecture for long-horizon tasks.

Submission Number: 11

Loading