Keywords: Membership Inference Attack, Privacy
Abstract: Machine learning (ML) models are susceptible to membership inference attacks (MIAs), where adversaries attempt to determine whether a specific data point is part of the model's training data. Recent studies suggest that MIAs often exploit the model’s overconfidence in predicting training samples, albeit using various proxy indicators. To mitigate this vulnerability, we introduce Adaptive Logit Scaling (ALS) loss, a simple yet effective modification to the standard Cross-Entropy loss. ALS adaptively constrains the norm of the output logits for each sample during training by decoupling and dynamically scaling overly large logits based on their magnitudes. The proposed approach reduces the models' overconfidence and ensures that they produce less distinguishable output metrics between member and non-member data. Extensive evaluations across four benchmark datasets show that ALS consistently achieves strong membership privacy while maintaining high model accuracy. Further comparisons with eight state-of-the-art defenses demonstrate that ALS effectively optimizes both sides of the privacy-utility trade-off, offering an effective and practical defense against MIAs.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 14947
Loading