Plausible Deniability Guarantees for Whistleblowers

Published: 23 May 2026, Last Modified: 23 May 2026ICML 2026 AIWILDEveryoneRevisionsBibTeXCC BY 4.0
Keywords: AI governance, privacy, auditing, safety, accountability, responsible AI, whistleblowing, algorithmic auditing
Abstract: Whistleblowers are a key safeguard against organizational wrongdoing, but the threat of retaliation deters reporting. Existing whistleblower-protection proposals lack formal privacy guarantees, and existing differential privacy mechanisms do not directly target the natural threat model — one in which the audited organization itself observes auditor selection decisions and uses them to identify reporters. We formalize this setting as per-report $(0, \delta)$-differential privacy on the transcript of audit selections under a strong-adversary threat model. Within this framework we prove that the standard approach — randomized response applied at the selection step — must approach uniform random auditing at any fixed $(0, \delta)$ level as the horizon grows. We then give a generic mechanism that reduces private auditing to private continual counting: any $(0, \delta)$-DP continual counter plugs in by post-processing, and the audit transcript inherits the same per-report guarantee. Instantiating the reduction with a recent work in continual counting yields per-report $(0, \delta)$-DP with noise scaling as $O(\sqrt{\log T})$ across a horizon of $T$ audit decisions. A utility theorem shows that the selection error vanishes whenever the noisy report gap between the most-reported organization and the runner-up grows faster than $\sqrt{\log T}$. Simulations show a substantial improvement over randomized response. Code to reproduce all experiments is available in the anonymized supplement.
Track: Regular Paper (9 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 34
Loading