Keywords: Privacy, Membership inference attack, Differential privacy, Transformer, ViT, BERT, Language Model
Abstract: Continual fine-tuning of large pre-trained models is now ubiquitous in industry for adapting a model to freshly collected user data. Existing privacy protection practices assume earlier training data is less sensitive and thus focus on the latest arriving samples. We challenge this assumption by tracking per-sample membership-inference risk across sequential fine-tuning rounds of popular transformer-based models, ViT for image data, and BERT for text data. Our experiments reveal the \emph{Privacy Déjà Vu Effect}: new data can \emph{remind} the model of semantically similar legacy samples, possibly elevating their privacy risk significantly. We further demonstrate that this resurgence is closely correlated with the latent-feature-space similarity between old and new examples. These findings underscore the need for a more comprehensive privacy protection mechanism in continual fine-tuning. We have published our code at \url{https://anonymous.4open.science/r/Privacy-Deja-vu-Effect-F006/README.md}.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 9697
Loading