Multimodal Pretrained Models for Verifiable Sequential Decision-Making: Planning, Grounding, and Perception

Published: 07 Nov 2023, Last Modified: 05 Dec 2023FMDM@NeurIPS2023EveryoneRevisionsBibTeX
Keywords: Foundation Model, Sequential Decision-Making, Automaton-Based Representation, Formal Method, Verification, Perception
TL;DR: We develop an algorithm that uses foundation models to construct automaton-based task controllers, verify the controllers against task specifications, and ground the controllers to task environments.
Abstract: Recently developed multimodal pretrained models can encode rich world knowledge expressed in multiple modalities, such as text and images. However, the outputs of these models cannot be integrated into algorithms to solve sequential decision-making tasks. We develop an algorithm that utilizes the knowledge from the pretrained models to construct and verify controllers for sequential decision-making tasks and to ground these controllers to task environments through visual observations. In particular, the algorithm constructs an automaton-based controller that encodes the task-relevant knowledge extracted from the pretrained model. It then verifies whether the knowledge encoded in the controller is consistent with other independently available knowledge, which may include abstract information on the environment or user-provided specifications. If this verification step discovers any inconsistency, the algorithm automatically refines the controller to resolve the inconsistency. Next, the algorithm leverages the vision and language capabilities of pretrained models to ground the controller to the task environment. We demonstrate the algorithm's ability to construct, verify, and ground automaton-based controllers through a suite of real-world tasks.
Submission Number: 12