OptiFluence: Scalable and Principled Design of Privacy Canaries

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: privacy auditing, differential privacy, optimization, influence functions
Abstract: Privacy auditing has emerged as a practical tool for empirically estimating training data leakage in machine learning models (in contrast to the provable but often overly pessimistic bounds provided by a differential privacy analysis). A common strategy is to use membership inference attacks to detect the presence in training data of specific canaries (points designed to maximize memorization). However, existing canary designs are largely heuristic, relying on mislabeled or out-of-distribution samples. We address this gap by formulating canary design as a bilevel optimization problem, where the model is trained in the inner loop and the canary is optimized in the outer loop to maximize its detectability. Building on this view, we develop OptiFluence, a scalable framework that combines two components: (i) influence-based pre-selection to identify promising canary seeds; and (ii) unrolled sample optimization with memory-efficient gradient techniques. Our approach achieves remarkable empirical performance on two standard privacy auditing datasets, MNIST and CIFAR-10: optimized canaries demonstrate up to 415× higher detectability than in-distribution baselines, reaching near-perfect detection rates of 99.5% TPR at 0.1%FPR. Critically, these canaries transfer effectively across different model architectures without retraining, enabling practical third-party privacy audits. This transferability allows regulators and auditors to assess model privacy without requiring access to proprietary training infrastructure or substantial computational resources.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 23058
Loading