CREDIT: Certified Defense of Deep Neural Networks against Model Extraction Attacks

Bolin Shen; Zhan Cheng; Neil Zhenqiang Gong; Fan Yao; Yushun Dong

CREDIT: Certified Defense of Deep Neural Networks against Model Extraction Attacks

Bolin Shen, Zhan Cheng, Neil Zhenqiang Gong, Fan Yao, Yushun Dong

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Deep Neural Networks, Model Extraction Defense

Abstract: Machine Learning as a Service (MLaaS) has become a widely adopted method for delivering deep neural network (DNN) models, allowing users to conveniently access models via APIs. However, such services have been shown to be highly vulnerable to Model Extraction Attacks (MEAs). While numerous defense strategies have been proposed, verifying the ownership of a suspicious model with strict theoretical guarantees remains a challenging task. To address this gap, we introduce CREDIT a certified defense against MEAs. Specifically, we employ mutual information to quantify the similarity between DNN models, propose a practical verification threshold, and provide rigorous theoretical guarantees for ownership verification based on this threshold. We extensively evaluate our approach on several mainstream datasets and achieve state-of-the-art performance. Our implementation is publicly available at: \url{https://anonymous.4open.science/r/CREDIT}.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 12160

Loading