Robust ML Auditing using Prior Knowledge

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 spotlightposterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Among the many technical challenges to enforcing AI regulations, one crucial yet underexplored problem is the risk of audit manipulation. This manipulation occurs when a platform deliberately alters its answers to a regulator to pass an audit without modifying its answers to other users. In this paper, we introduce a novel approach to manipulation-proof auditing by taking into account the auditor's prior knowledge of the task solved by the platform. We first demonstrate that regulators must not rely on public priors (e.g. a public dataset), as platforms could easily fool the auditor in such cases. We then formally establish the conditions under which an auditor can prevent audit manipulations using prior knowledge about the ground truth. Finally, our experiments with two standard datasets illustrate the maximum level of unfairness a platform can hide before being detected as malicious. Our formalization and generalization of manipulation-proof auditing with a prior opens up new research directions for more robust fairness audits.
Lay Summary: Do you remember Dieselgate? The car computer would detect when it was on a test-bench and reduce the engine power to fake environmental compliance. Well, this can happen in AI too. An AI audit is pretty straightforward. 1/ I, the auditor, come up with questions to ask your model. 2/ You, the platform, answer my questions. 3/ I look at your answers and decide whether your system abides by the law by computing a series of aggregate metrics. Now, you know the metric. You know the questions. Worst of all, I don’t have access to your model. Thus, nothing prevents you from manipulating the answers of your model to pass the audit. Researcher have proven that this is very easy! In this paper, we formalize a method to avoid manipulations as a search for efficient “audit priors”. We instantiate our framework with a simple idea: just look at the accuracy of the platform’s answers. Our experiments show that this can help reduce the amount of unfairness a platform could hide.
Primary Area: General Machine Learning->Evaluation
Keywords: ML audit, ML theory, fairness, fairwashing
Link To Code: https://github.com/grodino/merlin
Submission Number: 7115
Loading