Abstract: Machine learning as a service (MLaaS) has become a widely adopted approach, allowing customers to access even the most complex machine learning models through a pay-per-query model. Black-box distribution has been widely used to keep models secret in MLaaS. However, even with black-box distribution alleviating certain risks, the functionality of a model can still be compromised when customers gain access to their model’s predictions. To protect the intellectual property of model owners, we propose an effective defense method against model stealing attacks with the localized stochastic sensitivity (LSS), namely LSSMSD. First, suspicious queries are detected by employing an out-of-distribution (OOD) detector. Addressing a critical issue with many existing defense methods that overly rely on OOD detection results, thus affecting the model’s fidelity, we innovatively introduce LSS to solve this problem. By calculating the LSS of suspicious queries, we can selectively output misleading predictions for queries with high LSS using an misinformation mechanism. Extensive experiments demonstrate that LSSMSD offers robust protections for victim models against black-box proxy attacks such as Jacobian-based dataset augmentation and Knockoff Nets. It significantly reduces accuracies of attackers’ substitute models (up to 77.94%) while yields minimal impact to benign user accuracies (average \(-2.72\%\)), thereby maintaining the fidelity of the victim model.
External IDs:dblp:journals/mlc/ZhangCLZNW25
Loading