Learning Reliable PDDL Models for Classical Planning from Visual Data

Aymeric Barbin, Federico Cerutti, Alfonso Emilio Gerevini

Published: 2024, Last Modified: 13 May 2025ICTAI 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We propose R-latplan, a system that learns reliable symbolic (PDDL) representations of an agent's actions from noisy visual observations, and without explicit human expert knowledge. R-latplan builds upon Latplan, a model that also learns PDDL representations of actions from images. However, Latplan does not ensure that the learned actions correspond to the actual agent's actions and does not map the learned actions to the agent's actual capabilities. There is, therefore, a substantial risk that a learned action could be impossible due to the domain's physics or the agent's capability. R-latplan receives input pairs of (noisy) images representing the states before/after the agent's action is performed in the domain. Contrary to Latplan, it uses a transition identifier function that identifies the class of a transition and associates it as an action label for the pair of images. Our experimental analysis shows that: (1) R-latplan produces reliable PDDL models in which each action can be directly connected to an agent's high level actuators and lead to visually correct states (the agent does not hallucinate), (2) R-latplan generated PDDL models lead to a domain-independent planner to find optimal plans on each benchmarks considered, (3) R-latplan is robust against mislabeled transitions, i.e. if errors are introduced in the transition identifier function.