Keywords: sabotage, threat modelling, AI security, cybersecurity
TL;DR: We examine when states might sabotage AI training runs, create a threat modelling framework and assess plausibility and mitigations of different attacks.
Abstract: Much attention has been given to the possibility that states will attempt to steal the model weights of advanced AI systems. We argue that in most situations, it is more likely that a state will attempt to sabotage the training of the models underpinning these systems. We present a threat modelling framework for sabotage of AI training, including both the necessary technical background and a taxonomy of strategic considerations and attack vectors. We then use this to examine different attacks and assess both their technical plausibility and the mitigations required to defend against them.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 19592
Loading