Keywords: reinforcement learning, insider threat, offline DQN.
TL;DR: We formulate insider-threat prevention as an RL problem and show that offline DQN can learn non-trivial intervention policies from constructed transitions.
Abstract: Insider threats are among the hardest threats to detect, as malicious behaviour is hidden within legitimate user activity. Unlike classification approaches that label each observation independently, we frame insider threat prevention as a reinforcement learning problem where an agent monitors user behaviour over fixed time windows and decides whether to continue monitoring or block the user. A key property of our setting is that ground-truth labels allow us to simulate the consequence of both continuing and blocking at every time step, enabling us to train a Deep Q-Network variant on a pre-constructed dataset of transitions and Proximal Policy Optimisation on a fully simulated environment. Preliminary results across five user-level folds show detection rates of 42–64% of malicious users, with the agent typically intervening within 0–2 time windows of the first malicious activity. False positives remain a challenge, highlighting the need for improved reward design, state representations, and richer intervention actions.
Submission Number: 123
Loading