Learning Long-Horizon Multi-Agent Coordination from Temporal Logic Specifications

Albin Larsson Forsberg, Alexandros Nikou, Aneta Vulgarakis Feljan, Jana Tumova

Published: 04 Mar 2026, Last Modified: 27 Jan 2026ICAART 2026EveryoneCC BY-NC-ND 4.0

Abstract: We study multi-agent reinforcement learning (MARL) under temporally extended Signal Temporal Logic (STL) objectives, which require reasoning over both long-horizon dynamics and inter-agent relations. We propose TD-MAT, a transformer-based architecture with multivariate positional encodings, causal temporal masking, and a decomposed reward based on arithmetic–geometric mean robustness with variance regularization. Experiments on coordination tasks ranging from unstructured multi-objective problems to strict temporal sequencing show that TD-MAT learns effective long-term behaviors and generalizes to heterogeneous agent settings. Ablation studies highlight the necessity of temporal masking, positional encodings, and reward decomposition, while comparisons to MAPPO, RMAPPO, and MAT reveal that transformers provide the greatest benefit on unstructured, long-horizon tasks.