Abstract: The Anti Saturation Attack is a vital problem in the field of military defense. When confronted with a large scale missile attack in a short duration, how to optimally allocate interceptor missiles to minimize the loss of total assets until the end of the attack has been studied by many researchers. Recently, deep reinforcement learning methods have been applied to achieve a suboptimal defense policy. Yet there still remains some problems. The convergence process is dependent on the computing resources and the accumulated policies can not be well explained. Moreover, when the attack parameters in the environment is changed, the model needs to be trained from the beginning again, which limits the usage in real time decision-making scenarios. To this end, we propose a hybrid method, which speeds up the approximation to the optimal policy and can explain the attack patterns from the statistical point of view. Furthermore, the light-weight structure makes it possible to deploy in real-time cases. First, an integer convex optimization model is set up to generate initial data and feed the neural network. Second, a neural network and TD methods are implemented to predict the final asset value. Last but not least, the potential usage of the proposed method is testified in a simulation environment where different battlefield parameters and experiments have been set up and conducted. With comparison of other two methods (Heuristic and Deep Q-Network), experiments indicate the hybrid method outperforms the other two methods, with convergence speed nearly 20% faster than the DQN method and expectation of final assets value 12 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">%</sup> higher when training in the same episodes.
0 Replies
Loading