Abstract: The widespread adoption of Large Language Models (LLMs) has raised significant privacy concerns regarding the exposure of personally identifiable information (PII) in user prompts. To address this challenge, we propose a query-unrelated PII masking strategy and introduce PII-Bench, the first comprehensive evaluation framework for assessing privacy protection systems. PII-Bench comprises 2,842 test samples across 55 fine-grained PII categories, featuring diverse scenarios from single-subject descriptions to complex multi-party interactions. Each sample is carefully crafted with a user query, context description, and standard answer indicating query-relevant PII. Our empirical evaluation reveals that while current models perform adequately in basic PII detection, they show significant limitations in determining PII query relevance. Even state-of-the-art LLMs struggle with this task, particularly in handling complex multi-subject scenarios, indicating substantial room for improvement in achieving intelligent PII masking.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: Benchmark, Data Privacy, NLP datasets
Contribution Types: Data resources
Languages Studied: English
Previous URL: https://openreview.net/forum?id=Ns3TI7xltP
Explanation Of Revisions PDF: pdf
Reassignment Request Area Chair: No, I want the same area chair from our previous submission (subject to their availability).
Reassignment Request Reviewers: No, I want the same set of reviewers from our previous submission (subject to their availability)
A1 Limitations Section: This paper has a limitations section.
A2 Potential Risks: Yes
A2 Elaboration: 6
B Use Or Create Scientific Artifacts: Yes
B1 Cite Creators Of Artifacts: Yes
B1 Elaboration: 4.1
B2 Discuss The License For Artifacts: N/A
B3 Artifact Use Consistent With Intended Use: N/A
B4 Data Contains Personally Identifying Info Or Offensive Content: Yes
B4 Elaboration: 3.2.1
B5 Documentation Of Artifacts: Yes
B5 Elaboration: 3
B6 Statistics For Data: Yes
B6 Elaboration: 3.3
C Computational Experiments: Yes
C1 Model Size And Budget: Yes
C1 Elaboration: 4.1
C2 Experimental Setup And Hyperparameters: Yes
C2 Elaboration: 4.1
C3 Descriptive Statistics: Yes
C3 Elaboration: 4
C4 Parameters For Packages: Yes
C4 Elaboration: 4
D Human Subjects Including Annotators: Yes
D1 Instructions Given To Participants: Yes
D1 Elaboration: E
D2 Recruitment And Payment: Yes
D2 Elaboration: C
D3 Data Consent: Yes
D3 Elaboration: C
D4 Ethics Review Board Approval: Yes
D4 Elaboration: 7
D5 Characteristics Of Annotators: Yes
D5 Elaboration: 3.2.4; 3.4
E Ai Assistants In Research Or Writing: Yes
E1 Information About Use Of Ai Assistants: N/A
Author Submission Checklist: yes
Submission Number: 213
Loading