SafeGen: Benchmarking Inference-Time Methods for Privacy-Preserving Text Generation

Aravilli Atchuta Ram

SafeGen: Benchmarking Inference-Time Methods for Privacy-Preserving Text Generation

Aravilli Atchuta Ram

Published: 06 Nov 2025, Last Modified: 27 Jan 2026AIR-FM PosterEveryoneRevisionsBibTeXCC BY 4.0

Confirmation: I have read and agree with the workshop's policy on behalf of myself and my co-authors.

Keywords: AI Safety, Constrained Decoding, Inference-time PII prevention

Abstract: Large language models frequently leak personally identifiable information (PII) during text generation, posing significant privacy risks. While post-hoc filtering methods (e.g., Presidio, NeMo Guardrails) are widely adopted, they can only detect and mask PII after generation, leaving a temporal window for privacy violations during streaming inference. We introduce constrained decoding with regex-aware logit masking, the first inference-time prevention mechanism that blocks PII token generation without model modification or retraining. Our approach maintains a rolling window of generated text, applies pattern detection for structured PII, and masks probability distributions over tokens that would extend detected patterns. Evaluating on the pii-masking-400k dataset and a synthetic dataset, we demonstrate substantial leakage reduction with competitive latency overhead. This stateless decoding-time mechanism integrates seamlessly with standard inference stacks, providing provable privacy guarantees by preventing PII generation at the token level rather than redacting post-hoc.

Submission Track: Workshop Paper Track

Submission Number: 19

Loading