Keywords: Benchmark for Privacy Preserving ML, AI Safety
TL;DR: Benchmarking various privacy-preserving machine learning strategies on PII leakage rate and inference latency.
Abstract: Large language models frequently leak personally identifiable information (PII)
during text generation, posing significant privacy risks. While post-hoc filtering
methods (e.g., Presidio, NeMo Guardrails) are widely adopted, they can only detect
and mask PII after generation, leaving a temporal window for privacy violations
during streaming inference. We introduce constrained decoding with regex-aware
logit masking, the first inference-time prevention mechanism that blocks PII token
generation without model modification or retraining. Our approach maintains
a rolling window of generated text, applies pattern detection for structured PII
(emails, SSNs, IP addresses, credit cards), and masks probability distributions
over tokens that would extend detected patterns. Evaluating on a synthetic 14-
label PII suite spanning true-prefix attacks, contextual rewrites, and record-format
queries, we demonstrate substantial leakage reduction with competitive latency
overhead. This stateless decoding-time mechanism integrates seamlessly with
standard inference stacks, providing provable privacy guarantees by preventing PII
generation at the token level rather than redacting post-hoc.
Submission Number: 12
Loading