Comp-LTL: Temporal Logic Planning via Zero-Shot Policy Composition

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Robotics, Reinforcement Learning, Linear Temporal Logic, Logical Composition, Zero-shot
TL;DR: We develop a zero-shot mechanism to satisfy a Linear Temporal Logic (LTL) specification by solving for a set of existing safety-aware RL task primitives accepted by the product between the environment transition system and specification automaton.
Abstract: This work develops a zero-shot mechanism, Comp-LTL, for an agent to satisfy a Linear Temporal Logic (LTL) specification given existing task primitives trained via reinforcement learning (RL). Autonomous robots often need to safely and deterministically satisfy spatial and temporal goals that are unknown until run time. Prior work on learning policies to execute an LTL task incorporates the specification into the learning process, requiring retraining or fine-tuning if the specification changes. We present a more flexible approach--to create a pipeline to deterministically choose an execution set of composable safe task primitive policies that can be used to satisfy arbitrary LTL specifications without retraining or fine-tuning. Safe task primitives can be learned offline using RL with a reward function focused on penalizing unsafe actions and combined using Boolean composition at deployment. We focus on creating and pruning a transition system (TS) representation of the environment in order to solve for deterministic, non-ambiguous, and feasible solutions to LTL specifications given an environment with multiply-labeled regions and a set of safe task primitive policies. Our pruned TS is deterministic, contains no unrealizable transitions, and is sound. Combining the TS with the safe pretrained task primitives produces a sequence of composed policies that are guaranteed to deterministically satisfy an LTL specification. Training on a base set of safe tasks and composing at run time reduces total training time compared to non-composition approaches and has negligible processing time at run time. We verify our approach via simulation in grid-based and continuous environments, and compare it to other state of the art approaches, showing that Comp-LTL is safer, more adaptable, and quicker at satisfying unseen specifications at runtime.
Supplementary Material: zip
Primary Area: applications to robotics, autonomy, planning
Submission Number: 19439
Loading