Proof of Forgeability: Universal Repudiation against Membership Inference Attacks

Proof of Forgeability: Universal Repudiation against Membership Inference Attacks

ICLR 2026 Conference Submission19112 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Privacy Protection, AI safety

TL;DR: Proof-of-Forgeability uses member-like signal estimator to imperceptibly perturb non-members so that SOTA membership inference attacks be fooled across datasets and settings.

Abstract: Membership inference attacks (MIAs) aim to infer whether a data point was used to train a target model and are widely used to audit the privacy of machine learning (ML) models. In this work, we present a new approach to asserting repudiation evidence against MIA-supported claims. Existing strategies require computationally intensive, case-by-case proofs. We introduce Proof of Forgeability (PoF), which denies all membership claims with an universal repudiation. The key idea is to generate forged examples that are non-members yet are misclassified as members by MIAs. We construct forged examples by adding carefully designed perturbations to non-members so that the attack signal distribution derived from model outputs for the forged examples matches that of members. To achieve this, we use quantile matching to derive a member-like signal estimator (MLSE) that maps each non-member’s signal to its target member-like signal. We prove the optimality of this MLSE and derive closed-form expressions when the attack signal is the logit-scaled true-label confidence. We then apply a first-order Taylor expansion of the signal with respect to the input to bridge the input and signal space. This relation converts the target signal change into an input perturbation and yields the designed perturbation in closed form. Empirical results demonstrate that the forged examples indeed confuse the MIAs in comparison with the genuine members; meanwhile, the forged examples differ imperceptibly from the original non-members in input content while fully preserving data utility.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Submission Number: 19112

Loading