Abstract: Cloud-based machine learning services offer significant advantages but also introduce the risk of tampering with cloud-deployed deep neural network (DNN) models. Black-box integrity verification (BIV) allows model owners and end-users to determine if a cloud-deployed DNN model has been tampered with by examining only the top-1 label responses. Fingerprinting generates fingerprint samples to query the model, achieving BIV with no impact on the model's accuracy. In this paper, we present BIVBench, the first comprehensive benchmark for BIV of DNN models. BIVBench covers 16 types of model modifications, providing extensive coverage of practical modification scenarios. Our analysis reveals that existing fingerprinting methods, which are typically focused on significant tampering, lack the sensitivity needed to effectively detect subtle yet common and potentially severe modifications. To address this limitation, we propose MiSentry (Model Integrity Sentry), a novel fingerprinting method that leverages meta-learning. MiSentry strategically incorporates a few subtly modified models into the meta-learning model zoo and maximizes the divergence of output predictions between the target model and the modified models in the model zoo to generate highly sensitive, generalizable, and effective fingerprint samples. Extensive evaluations using BIVBench demonstrate that MiSentry outperforms existing state-of-the-art methods overall and significantly surpasses them in detecting subtle modifications. The BIVBench and supplementary materials are available at: https://github.com/CGCL-codes/BIVBench.
Loading