Keywords: model inversion attribute inference, privacy, disparate vulnerability
TL;DR: Designing a novel black-box model inversion attribute inference attack with the least adversarial capabilities, and a detailed analysis of its disparate vulnerability property
Abstract: In this paper, we study model inversion attribute inference (MIAI), a machine learning (ML) privacy attack that aims to infer sensitive information about the training data given access to the target ML model. We design a novel black-box MIAI attack that assumes the least adversary knowledge/capabilities to date while still performing similar to the state-of-the-art attacks. Further, we extensively analyze the disparate vulnerability property of our proposed MIAI attack, i.e., elevated vulnerabilities of specific groups in the training dataset (grouped by gender, race, etc.) to model inversion attacks. First, we investigate existing ML privacy defense techniques-- (1) mutual information regularization, and (2) fairness constraints, and show that none of these techniques can mitigate MIAI disparity. Second, we empirically identify possible disparity factors and discuss potential ways to mitigate disparity in MIAI attacks. Finally, we demonstrate our findings by extensively evaluating our attack in estimating binary and multi-class sensitive attributes on three different target models trained on three real datasets.