Theoretical Limits of Provable Security Against Model Extraction by Efficient Observational Defenses
Keywords: Model Extraction, Model Stealing, Covert Learning, Provable Security, Adversarial Machine Learning
TL;DR: We prove that a large body of model extraction defense techinques are inadequate for a large body of machine learning models, under computational assumptions.
Abstract: Can we hope to provide provable security against model extraction attacks? As a step towards a theoretical study of this question, we unify and abstract a wide range of "observational" model extraction defenses (OMEDs) --- roughly, those that attempt to detect model extraction by analyzing the distribution over the adversary's queries. To accompany the abstract OMED, we define the notion of complete OMEDs --- when benign clients can freely interact with the model --- and sound OMEDs --- when adversarial clients are caught and prevented from reverse engineering the model. Our formalism facilitates a simple argument for obtaining provable security against model extraction by complete and sound OMEDs, using (average-case) hardness assumptions for PAC-learning, in a way that abstracts current techniques in the prior literature. The main result of this work establishes a partial computational incompleteness theorem for the OMED: any efficient OMED for a machine learning model computable by a polynomial size decision tree that satisfies a basic form of completeness cannot satisfy soundness, unless the subexponential Learning Parity with Noise (LPN) assumption does not hold. To prove the incompleteness theorem, we introduce a class of model extraction attacks called natural Covert Learning attacks based on a connection to the Covert Learning model of Canetti and Karchmer (TCC '21), and show that such attacks circumvent any defense within our abstract mechanism in a black-box, nonadaptive way. As a further technical contribution, we extend the Covert Learning algorithm of Canetti and Karchmer to work over any "concise" product distribution, by showing that the technique of learning with a distributional inverter of Binnendyk et al. (ALT '22) remains viable in the Covert Learning setting. Finally, we further expose the tension between Covert Learning and OMEDs by proving that the existence of Covert Learning algorithms requires the nonexistence of provable security via efficient OMEDs. Therefore, we observe a ``win-win" result, by obtaining a characterization of the existence of provable security via efficient OMEDs by the nonexistence of natural Covert Learning algorithms.