Abstract: Fine-tuning Large Language Models (LLMs) on sensitive datasets poses a significant risk of unintended memorization and leakage of Personally Identifiable Information (PII), potentially violating privacy regulations and endangering individuals.
In this work, we examine how fine-tuning can expose PII that appears only in the inputs, not in the training targets, highlighting a critical and underexplored vulnerability in real-world applications.
Using both synthetic and real-world datasets, we design controlled extraction probes to evaluate PII memorization and analyze how factors such as language, domain, task type, and dataset size affect memorization behavior.
Additionally, we benchmark four privacy-preserving methods: differential privacy, machine unlearning, regularization, and preference alignment.
Our findings show that post-training methods yield more consistent privacy–utility trade-offs, while differential privacy achieves the strongest leakage reduction in specific cases, albeit with training instability.
Paper Type: Short
Research Area: Language Modeling
Research Area Keywords: LLM/AI agents, security and privacy, fine-tuning, safety and alignment, robustness, prompting
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English, German, Spanish, Swedish, Dutch, French, Italian
Reassignment Request Area Chair: This is not a resubmission
Reassignment Request Reviewers: This is not a resubmission
Software: zip
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: Yes
A2 Elaboration: Section Ethics Statement
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: Section 3, B, C
B2 Discuss The License For Artifacts: Yes
B2 Elaboration: MIT license for code
B3 Artifact Use Consistent With Intended Use: Yes
B3 Elaboration: Section Ethics Statement
B4 Data Contains Personally Identifying Info Or Offensive Content: Yes
B4 Elaboration: Section 3.1, B
B5 Documentation Of Artifacts: Yes
B5 Elaboration: Brief readme for code in supplementary materials
B6 Statistics For Data: Yes
B6 Elaboration: Section 3, B
C Computational Experiments: Yes
C1 Model Size And Budget: Yes
C1 Elaboration: Section C
C2 Experimental Setup And Hyperparameters: Yes
C2 Elaboration: Section 4, C
C3 Descriptive Statistics: Yes
C3 Elaboration: Section 4, A
C4 Parameters For Packages: Yes
C4 Elaboration: Section 3, C, D
D Human Subjects Including Annotators: Yes
D1 Instructions Given To Participants: Yes
D1 Elaboration: Section B, we annotated our own dataset
D2 Recruitment And Payment: N/A
D3 Data Consent: Yes
D3 Elaboration: Section Ethics Statement, B
D4 Ethics Review Board Approval: Yes
D4 Elaboration: Section Ethics Statement
D5 Characteristics Of Annotators: No
D5 Elaboration: The authors participated in the annotation, would breach anonymity.
E Ai Assistants In Research Or Writing: Yes
E1 Information About Use Of Ai Assistants: No
E1 Elaboration: Only basic coding and rephrasing language during writing. No research or new content generated by AI. We carefully reviewed everything and take full responsibility.
Author Submission Checklist: yes
Submission Number: 1262
Loading