Scalable and Generalizable RL Agents for Attack Path Discovery via Continuous Invariant Spaces

Franco Terranova, Abdelkader Lahmadi, Isabelle Chrisment

Published: 2025, Last Modified: 30 Apr 2026RAID 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Identifying critical attack paths in a net-work-sequences of vulnerabilities an attacker can chain to achieve a specific threat model-is crucial for pinpointing vulnerable areas where defensive measures should be focused. Recently, Reinforcement Learning (RL) has gained traction for training agents in identifying these critical paths. However, current solutions typically train RL agents tailored to a specific environment-defined by a fixed network structure and vulnerability set-requiring costly retraining whenever either changes. This limitation arises from optimizing the agent to map between discrete input and output spaces, treating network nodes and vulnerabilities as atomic discrete elements. In this paper, we propose a method for constructing continuous and invariant input and output spaces for RL agents, enabling them to learn transferable policies that generalize across diverse network configurations and vulnerability sets. We also release Continuous CyberBattleSim (C-CyberBattleSim), an enhanced version of Microsoft CyberBattleSim designed to train agents with the novel continuous spaces. The tool is further extended to integrate realworld vulnerability data and a new scenario generation pipeline to improve the realism of training and testing environments. Agents trained in continuous spaces are assessed in 800 scenarios with varying sizes and various allocations of 829 real-world vulnerabilities, demonstrating an average improvement of 9.3x in scalability against agents trained in discrete spaces, as well as an average generalization score of $89 \%$ to more complex scenarios when trained in simpler scenarios. A final study evaluates whether continuous agents trained in simulation can adapt to real-world and emulated scans. On average, agents achieve $75 \%$ of the score they would have if trained directly on the scans, demonstrating effective knowledge transfer.

External IDs:dblp:conf/raid/TerranovaLC25