Abstract: An optimal network defense decision-making method is crucial for maximizing the actions of blue agents against red attackers under resource-constrained conditions. This study aims to address two challenges: (1) Characterizing the defense process more reasonably; (2) Low the explore efficiency of RL. This work breaks the constraint of solely focusing on attackers and defenders in the cyber security, by incorporating green users and use their network availability as a metric to evaluate blue agents. Then we use Decentralized Partial Observable Markov Decision Process (Dec-POMDP) to model the interactions between the agents and networks, and utilizes reinforcement learning to obtain optimal defense strategy for multi-agents using value-decomposition networks (VDN). We implemented the model and evaluate the trained agents on an existing well-designed scenario. The experiment results show that our algorithm outperforms several baselines.
Loading