Learning Continuous 3-DoF Air-to-Air Close-in Combat Strategy using Proximal Policy OptimizationDownload PDFOpen Website

Published: 01 Jan 2022, Last Modified: 10 May 2023CoG 2022Readers: Everyone
Abstract: Air-to-air close-in combat is based on many basic fighter maneuvers and can be largely modeled as an algorithmic function of inputs. This paper studies autonomous close-in combat, to learn new strategy that can adapt to different circumstances to fight against an opponent. Current methods for learning close-in combat strategy are largely limited to discrete action sets whether in the form of rules, actions or sub-polices. In contrast, we consider one-on-one air combat game with continuous action space and present a deep reinforcement learning method based on proximal policy optimization (PPO) that learns close-in combat strategy from observations in an end-to-end manner. The state space is designed to promote the learning efficiency of PPO. We also design a minimax strategy for the game. Simulation results show that the learned PPO agent is able to defeat the minimax opponent with about 97% win rate.
0 Replies

Loading