Benchmarking Text-to-Image Safety: Using Adaptation Methods to Mitigate Oversexualization

Benchmarking Text-to-Image Safety: Using Adaptation Methods to Mitigate Oversexualization

TMLR Paper5046 Authors

06 Jun 2025 (modified: 22 Sept 2025)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Generative text-to-image (T2I) models are capable of producing high quality images from user prompts. However, these models are known to generate sexually explicit content even for benign prompts, posing safety risks and misaligning with user intent. While emerging research proposes mitigation techniques to reduce sexually explicit content, there has yet to be a systematic benchmark to evaluate their effectiveness. Furthermore, little attention has been paid to oversexualization, cases where the generated images are more sexualized than the user prompt intends, which presents a distinct safety risk. Oversexualization may have more adverse outcomes than intentional adversarial prompting as it leaves users unintentionally exposed to harmful content. In this paper, we introduce the first comprehensive benchmark of adaptation methods, including both inference-time and fine-tuning methods, to mitigate oversexualized content in T2I models. We also introduce a novel benchmark dataset, Benign2NSFW, designed to provoke oversexualization in T2I systems, to allow the community to measure the effectiveness of such techniques. Finally,we assess the impact of reducing oversexualization on other factors, such as aesthetic quality and image-prompt alignment. Our work offers a comprehensive overview of various strategies for harm reduction in T2I systems, which we hope will help practitioners balance safety with other quality aspects.

Submission Length: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Hongsheng_Li3

Submission Number: 5046

Loading