Abstract: Differential privacy quantifies privacy through the privacy budget , yet its practical interpretation is complicated by variations across models and datasets. Recent research on differentially private machine learning and membership inference has highlighted that with the same theoretical setting, the likelihood-ratio-based membership inference (LiRA) attack success rate (ASR) may vary according to specific datasets and models, which might be a better indicator for evaluating real-world privacy risks. Inspired by this practical privacy measure, we study the positive correlation between the setting and ASR. We also find that for a specific dataset and a specific task we can lower the attack success rate by modifying the dataset. As a result, we may enable flexible privacy budget settings in model training. One dataset modification strategy is selectively suppressing privacy-sensitive features without significantly damaging application-specific data utility. We use the SHAP (or LIME) model explainer to evaluate features’ privacy sensitivity and utility importance and develop an optimized feature-masking algorithm. We have conducted extensive experiments to show (1) the inherent link between ASR and the dataset’s privacy risk in terms of a specific modeling task; (2) By carefully selecting features to mask, we can preserve more data utility with equivalent practical privacy protection and relaxed settings. The implementation details are shared online at https://github.com/RhincodonE/On-sensitive-features-and-empirical-epsilon-lower-bounds.
External IDs:dblp:conf/bigdataconf/GuC24
Loading