Combining Dense and Sparse Rewards to Improve Deep Reinforcement Learning Policies in Reach-Avoid Games with Faster Evaders in Two vs. One Scenarios
Abstract: This paper investigates a variation of the reach-avoid game, a multi-agent pursuit and evasion scenario applicable to aerial defense, with faster evaders. Using Deep Reinforcement Learning techniques, the study proposes a different reward function that combines dense (distance-based) and sparse (outcome-based) rewards. Focused on the defender’s perspective in aerial defense, this new reward function resulted in effective learned policies against faster evaders, outperforming traditional differential game and DRL strategies with dense-only rewards. Moreover, the learned policy demonstrated versatility across different instances of the problem, including changes in pursuer speeds and winning radii, illustrating its versatility in unseen situations during training.
Loading