Data Forging Attacks on Cryptographic Model Certification

Published: 23 Sept 2025, Last Modified: 04 Dec 2025RegML 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: auditing, zero-knowledge, privacy, decision trees
TL;DR: We propose and formalize attacks against a wide class of privacy-preserving ML audit approaches.
Abstract: Privacy-preserving machine learning auditing protocols allow auditors to assess models for properties such as fairness or robustness, without revealing their internals or training data. This makes them especially attractive for auditing models deployed in sensitive domains such as healthcare or finance. For these protocols to be truly useful, though, their guarantees must reflect how the model will behave once deployed, not just under the conditions of an audit. Existing security definitions often miss this mark: most certify model behavior only on a \emph{fixed audit dataset}, without ensuring that the same guarantees generalize to other datasets drawn from the same distribution. We show that a model provider can attack many cryptographic model certification schemes by forging training data, resulting in a model that exhibits benign behavior during an audit, but pathological behavior in practice. For example, we empirically demonstrate that an attacker can train a model that achieves over 99% accuracy on an audit dataset, but less than 30% accuracy on fresh samples from the same distribution.
Submission Number: 82
Loading