Strategic AI Training Sabotage: State Attacks on Advanced Systems' Development

Strategic AI Training Sabotage: State Attacks on Advanced Systems' Development

ICLR 2026 Conference Submission19592 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: sabotage, threat modelling, AI security, cybersecurity

TL;DR: We examine when states might sabotage AI training runs, create a threat modelling framework and assess plausibility and mitigations of different attacks.

Abstract: Much attention has been given to the possibility that states will attempt to steal the model weights of advanced AI systems. We argue that in most situations, it is more likely that a state will attempt to sabotage the training of the models underpinning these systems. We present a threat modelling framework for sabotage of AI training, including both the necessary technical background and a taxonomy of strategic considerations and attack vectors. We then use this to examine different attacks and assess both their technical plausibility and the mitigations required to defend against them.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 19592

Loading