SCRIPT: A Scalable Continual Reinforcement Learning Framework for Autonomous Penetration Testing

Shicheng Zhou, Jingju Liu, Yuliang Lu, Jiahai Yang, Yue Zhang, Bo Lin, Xiaofeng Zhong, Shulong Hu

Published: 01 Jan 2025, Last Modified: 09 Oct 2025Expert Syst. Appl. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Compared with manual-based penetration testing (pentesting), autonomous pentesting offers advantages in autonomy and efficiency, emerging as a trending research area. Reinforcement learning (RL) is a natural fit for studying this topic. However, the proliferation of interconnected devices and systems has expanded attack surfaces, coupled with the constant emergence of new vulnerabilities, making pentesting tasks and scenarios in the real world non-stationary. This non-stationarity presents a new challenge for RL-based autonomous pentesting, as it requires pentesting agents to possess strong generalization capabilities, thereby achieving continuous learn new tasks while mitigating catastrophic forgetting. Previous research in this field overlooked this real-world challenge. To this end, we present the first scalable continual RL framework for autonomous pentesting, namely SCRIPT. SCRIPT trains the agent to learn continually like humans in a large-scale task sequence, allowing it to realize positive forward transfer (leveraging previously learned knowledge to accelerate new task learning) and resist catastrophic forgetting. Inspired by humans’ lifelong learning process, we separate SCRIPT’s learning process into new task learning and knowledge consolidation processes. Specifically, we introduce a scaffolded curriculum learning approach to bootstrap new task learning through rolling exploration and policy imitation. Besides, we propose a policy retention method that utilizes regularization-based weight importance constraints and progressive self-distillation-based policy retrospection to mitigate catastrophic forgetting. Experiments are conducted using Vulhub, a widely used repository of vulnerable Docker environments. Results demonstrate that SCRIPT effectively enables agents to achieve positive forward transfer and enhance their resistance to catastrophic forgetting, making it a feasible approach for pentesting agents to handle the real-world non-stationarity challenge and tackle expanded attack surfaces.

External IDs:dblp:journals/eswa/ZhouLLYZLZH25