Learning Long-Horizon Multi-Agent Coordination from Temporal Logic Specifications
Abstract: We study multi-agent reinforcement learning (MARL) under temporally extended Signal Temporal Logic (STL) objectives, which require reasoning over both long-horizon dynamics and inter-agent relations. We propose TD-MAT, a transformer-based architecture with multivariate positional encodings, causal temporal masking, and a decomposed reward based on arithmetic–geometric mean robustness with variance regularization. Experiments on coordination tasks ranging from unstructured multi-objective problems to strict temporal sequencing show that TD-MAT learns effective long-term behaviors and generalizes to heterogeneous agent settings. Ablation studies highlight the necessity of temporal masking, positional encodings, and reward decomposition, while comparisons to MAPPO, RMAPPO, and MAT reveal that transformers provide the greatest benefit on unstructured, long-horizon tasks.
Loading