Keywords: memorization, deep learning, data privacy, machine learning
TL;DR: We propose over-memorization with dummy data to mitigate unintended memorization of training data to reduce privacy risks.
Abstract: From the advances of deep learning, the privacy concerns of deep neural networks are in the limelight. A particular concern is privacy of the training data, which is often compromised by the model's inherent memorization capabilities. Suppressing such memorization can enhance privacy but introduces two main challenges: 1) removing a memorized instance from the training dataset will result in the model to memorize another instance instead, and 2) the memorization is essential for improving the generalization error. To address these challenges, we propose an over-memorization method that involves training the model with both the standard training set and a set of redundant, non-sensitive instances. Our method leverages the model's limited memorization capacity to focus on irrelevant data, thereby preventing it from memorizing the training data. Our empirical results demonstrate that this method not only enhances protection against membership inference attacks but also minimizes the loss of utility by effectively redirecting the model's generalization efforts towards non-sensitive instances.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9322
Loading