Keywords: Model extraction defense, ensemble methods, machine-learning-as-a-service
Abstract: Model extraction attacks aim to replicate the functionality of a black-box model through query access, threatening the intellectual property (IP) of machine-learning-as-a-service (MLaaS) providers. Defending against such attacks is challenging, as it must balance efficiency, robustness, and utility preservation in real-world scenarios. Despite recent advances, most existing defenses presume that attacker queries are out-of-distribution (OOD), enabling detection or disruption of suspicious inputs. However, this assumption is increasingly unreliable: modern models are trained on diverse datasets, and attackers often operate under limited query budgets. As a result, the effectiveness of these defenses is significantly compromised in realistic deployment settings. To address this gap, we propose MISLEADER (enseMbles of dIStiLled modEls Against moDel ExtRaction), a novel defense strategy that does not rely on OOD assumptions. MISLEADER formulates model protection as a bilevel optimization problem that simultaneously preserves predictive fidelity on benign inputs and reduces extractability by potential clone models. Our framework integrates data augmentation to simulate attacker queries and ensembles heterogeneous distilled models to enhance robustness and diversity. We further develop a tractable approximation algorithm and provide theoretical error bounds to characterize defense effectiveness. Extensive experiments across various settings validate the utility-preserving and extraction-resistant properties of our proposed defense strategy. Our code is available at https://anonymous.4open.science/r/misleader-B54B.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 22300
Loading