Comp-LTL: Temporal Logic Planning via Zero-Shot Policy Composition

Comp-LTL: Temporal Logic Planning via Zero-Shot Policy Composition

ICLR 2026 Conference Submission19439 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Robotics, Reinforcement Learning, Linear Temporal Logic, Logical Composition, Zero-shot

TL;DR: We develop a zero-shot mechanism to satisfy a Linear Temporal Logic (LTL) specification by solving for a set of existing safety-aware RL task primitives accepted by the product between the environment transition system and specification automaton.

Abstract: This work develops a zero-shot mechanism, Comp-LTL, for an agent to satisfy a Linear Temporal Logic (LTL) specification given existing task primitives trained via reinforcement learning (RL). Autonomous robots often need to satisfy spatial and temporal goals that are unknown until run time. Prior work focuses on learning policies for executing a task specified using LTL, but they incorporate the specification into the learning process. Any change to the specification requires retraining the policy, either via fine-tuning or from scratch. We present a more flexible approach -- to learn a set of composable task primitive policies that can be used to satisfy arbitrary LTL specifications without retraining or fine-tuning. Task primitives can be learned offline using RL and combined using Boolean composition at deployment. This work focuses on creating and pruning a transition system (TS) representation of the environment in order to solve for deterministic, non-ambiguous, and feasible solutions to LTL specifications given an environment with multiply-labeled regions and a set of task primitive policies. We show that our pruned TS is deterministic, contains no unrealizable transitions, and is sound. Training physical robots for every possible task is expensive--by training on a base set of tasks and composing at run time we are able to reduce total training time compared to non-composition approaches and have negligible processing time at run time. We verify our approach via simulation and compare it to other state of the art approaches, showing that Comp-LTL is safer, more adaptable, and quicker at satisfying unseen specifications at runtime.

Supplementary Material: zip

Primary Area: applications to robotics, autonomy, planning

Submission Number: 19439

Loading