Keywords: auditing, zero-knowledge, privacy, decision trees
TL;DR: We propose and formalize attacks against a wide class of privacy-preserving ML audit approaches.
Abstract: Privacy-preserving auditing of machine learning models has emerged as a key research direction with growing real-world importance. Despite rapid progress, the field still lacks a unifying security foundation for evaluating proposed solutions. In this work, we identify a fundamental gap between the security models underlying many audit protocols—focused on interactions between prover (model owner) and verifier(s) (auditors)—and the guarantees one would naturally expect. We show how this gap enables a broad class of attacks, called data forging attacks, even against protocols with formal cryptographic proofs of security.
Crucially, prior works are not technically incorrect; rather, their guarantees fail to generalize to other datasets, even though they are from the same distribution as the audit dataset. This generalization step is typically not captured in definitions of well-known cryptographic techniques such as zero-knowledge proofs.
We formalize this gap by introducing a general framework for modeling attacks on privacy-preserving audits. Using this framework, we demonstrate concrete data forging attacks across widely studied model classes. For example, a prover can falsely certify that a model is accurate (indeed, it will achieve over 80\% accuracy on an audit dataset), while the model achieves only 30\% in practice.
Our results highlight the need to revisit the foundations of privacy-preserving auditing frameworks. We hope that our work provides both cautionary evidence and constructive guidance for the design of secure ML auditing solutions.
Submission Number: 82
Loading