Multimodal Graph-Based Reinforcement Learning for Multi-Agent Autonomous Navigation

Mohammad Reza Mohebbi, Elahe Kafash, Mario Döller

Published: 2025, Last Modified: 28 Feb 2026ICTAI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Accurate prediction of multi-agent behavior in dense and dynamic environments remains a central challenge for autonomous systems, with applications in robotic navigation, traffic management, and human-robot interaction. Conventional approaches often fail to model both the spatial interdependencies among agents and the long-term temporal patterns required for reliable trajectory prediction. To address this limitation, a multimodal framework is proposed that integrates Graph Neural Networks (GNNs), Long Short-Term Memory (LSTM) networks, and Inverse Reinforcement Learning (IRL). Spatial dependencies are represented through GNN layers, temporal dynamics are encoded by LSTMs, and reward functions are inferred from demonstrations via IRL. Multimodal fusion between trajectory histories and aerial imagery is further introduced to enhance environmental awareness. The framework is evaluated on the Stanford Drone Dataset (SDD) and SUMO simulations, where higher accuracy, robustness, and safety are observed relative to established baselines, including LSTM, Social-LSTM, and representative spatio-temporal models. A deployment-oriented analysis covering computational efficiency, robustness to noise/occlusion, and safety metrics (collision rate and time-to-collision) is also provided, underscoring the practicality of the approach for real-world navigation.

External IDs:dblp:conf/ictai/MohebbiKD25