ABCDE: Agentic-Based Controlled Dynamic Erasure for Intent-Aware Safety Reasoning

Published: 20 Apr 2026, Last Modified: 20 Apr 2026Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Concept erasure has emerged as a central mechanism for safety alignment in text-conditioned generative models, yet most existing approaches implicitly adopt an unconditional suppression paradigm in which target concepts are removed whenever they appear, regardless of contextual intent. This formulation conflates benign and harmful concept usage, leading to systematic over-suppression that unnecessarily censors policy-compliant content and degrades model utility. We argue that safety intervention should instead be framed as a decision problem grounded in contextual language understanding, rather than as a purely mechanistic removal operation. Based on this perspective, we introduce Intent-Aware Concept Erasure (ICE), a decision-centric formulation that explicitly separates the question of whether a concept should be suppressed from how suppression is realized, enabling context-sensitive intervention policies that preserve benign usage while maintaining safety guarantees. To operationalize this formulation, we present Agentic-Based Controlled Dynamic Erasure (ABCDE), an agentic framework that infers a stable intervention decision from semantic context and realizes it through minimal prompt-level intervention with closed-loop multimodal output feedback, without modifying model parameters. To enable principled evaluation of intent-aware intervention, we further construct the Context-Aware Erasure Benchmark (CAEB), a paired benchmark comprising 500 prompts over 10 object concepts and 100 prompts over 5 artist styles, in which the same concept appears in both removal-required and preservation-required contexts. Experiments on CAEB show that ABCDE achieves substantially higher precision than unconditional baselines while maintaining strong recall, demonstrating effective avoidance of unnecessary suppression in benign contexts.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~bo_han2
Submission Number: 7123
Loading