RedacBench: Can AI Erase Your Secrets?

ICLR 2026 Conference Submission17778 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Redaction, Benchmark, Security, Language Model, Privacy, Sensitive Information, Data Sanitization
TL;DR: We introduce RedacBench, a novel benchmark for the comprehensive evaluation of redaction capabilities, independent of specific data domains or redaction methods.
Abstract: The ability of modern language models to easily extract unstructured sensitive information has made redaction—the selective removal of such information—an essential task for data security. However, existing benchmarks and evaluation methods for redaction are often limited to predefined categories of data like personally identifiable information (PII), or particular techniques like masking. To bridge this gap, we introduce RedacBench, a novel benchmark for a comprehensive evaluation of redaction capabilities, independent of specific domains or redaction strategies. Constructed from 514 human-written texts from individuals, corporations, and governments, along with 187 security policies, RedacBench measures a model's ability to selectively remove policy-violating information while preserving the original text's utility. We robustly quantify this performance using metrics derived from 8,053 inferable propositions, assessing both security—through the redaction of sensitive propositions—and utility—through the preservation of non-sensitive ones. Our experiments on various redaction strategies using state-of-the-art language models reveal that while more advanced models and strategies can increase security, maintaining utility remains a significant challenge. To facilitate future research, we publicly release RedacBench along with a web-based playground for custom dataset creation and evaluation at https://redacbench.vercel.app/.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 17778
Loading