Maneuver-Conditioned Decision Transformer for Tactical in-Flight Decision-Making

Published: 2024, Last Modified: 17 Jan 2026IEEE Robotics Autom. Lett. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Autonomous maneuver decision in air combat is a challenging task with high-dimensional state-action spaces and nonlinear dynamics. Existing approaches are usually based on online learning paradigms, which hinders their application to real-world scenarios where online interactions, i.e., trial-and-error, are impractical or dangerous. In this paper, we explore the offline reinforcement learning framework for tactical air combat. To this end, we first construct a large-scale offline dataset of demonstrations from hand-designed planners, humans, and expert policies using an interactive simulator. A transformer-based architecture with a lightweight maneuver pool is then proposed to store and retrieve information for generating effective tactical maneuvers, while maintaining the sequential modeling ability of the decision transformers. The maneuver pool is structured in a key-value memory space, where key-value pairs are used for addressing and reading the maneuver pool. We also propose pool diversity and centralizing losses to learn our offline policy, boosting the discriminative power of learned features from the offline dataset. Our formulation allows us to learn maneuver-specific feature prototypes, and to explicitly leverage such knowledge during inference. Extensive experimental results and ablation studies demonstrate the effectiveness and flexibility of the proposed method, outperforming other offline baselines.
Loading