Locking Open Weight Models with Spectral Deformation

Published: 05 Jun 2025, Last Modified: 15 Jul 2025ICML 2025 Workshop TAIG PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Preconditioning, Open Weight Misuse, Robust Safeguards
TL;DR: Training open weight models can be made aribtrarily slow without impacting original behaviour by manipulating their spectral properties, we can use this mechanism to reduce the risk of open weight release.
Abstract: Training open-weight foundation models for harmful purposes could be prevented if optimization was made arbitrarily slow. We find that loss landscape conditioning, which controls the convergence rate of Gradient Descent, can be modified using the spectral values of neural network weight matrices alone resulting in an efficient iterative algorithm (Spectral Deformation) that can arbitrarily slow down training such that it becomes infeasible. We call this process ``model locking'' and show across modalities that our lock prevents key high-risk open weight misuse : (1) unauthorized training (2) backdoor injection, and (3) relearning attacks after unlearning. Training locks present new possibilities for AI governance which we illustrate with policy analysis drawing on parallels from copyright protection technology and anti-circumvention law.
Submission Number: 13
Loading