Constrained Decoding for Privacy-Preserving LLM Inference

Published: 08 Nov 2025, Last Modified: 21 Nov 2025ResponsibleFM @ NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Benchmark for Privacy Preserving ML, AI Safety
TL;DR: Benchmarking various privacy-preserving machine learning strategies on PII leakage rate and inference latency.
Abstract: Large language models frequently leak personally identifiable information (PII) during text generation, posing significant privacy risks. While post-hoc filtering methods (e.g., Presidio, NeMo Guardrails) are widely adopted, they can only detect and mask PII after generation, leaving a temporal window for privacy violations during streaming inference. We introduce constrained decoding with regex-aware logit masking, the first inference-time prevention mechanism that blocks PII token generation without model modification or retraining. Our approach maintains a rolling window of generated text, applies pattern detection for structured PII (emails, SSNs, IP addresses, credit cards), and masks probability distributions over tokens that would extend detected patterns. Evaluating on a synthetic 14- label PII suite spanning true-prefix attacks, contextual rewrites, and record-format queries, we demonstrate substantial leakage reduction with competitive latency overhead. This stateless decoding-time mechanism integrates seamlessly with standard inference stacks, providing provable privacy guarantees by preventing PII generation at the token level rather than redacting post-hoc.
Submission Number: 12
Loading