SPADE: SEMANTIC-PRESERVING ADAPTIVE DETOXIFICATION OF IMAGES

ICLR 2026 Conference Submission244 Authors

01 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Toxic, Text-to-Image, Detoxification, Stable Diffusion, Image
TL;DR: Detoxified Variants of Toxic Images
Abstract: Image generation models often struggle with safety-critical edits, especially detoxifying harmful visual content without losing semantic context. We introduce SPADE, a novel dataset for *controlled, graded detoxification of toxic images*. Each toxic image is paired with three semantically aligned, progressively detoxified variants that preserve knowledge relevance, scene context, and visual consistency. This enables models to learn nuanced, fine-grained detoxification editing beyond binary filtering, addressing the trade-off between harm reduction and semantic preservation. SPADE comprises 2,500 toxic images and multi-level detoxified counterparts, captions, and contextual stories. We benchmark detoxification through human preferences, CLIP-based similarity, and structural metrics, establishing SPADE as the first resource for graded, controllable detoxification in image generation. Our work lays the foundation for safe, interpretable, and context-aware visual moderation.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 244
Loading