Track: Type A (Regular Papers)
Keywords: Reward Models, T2I, Stereotypes, Prompt Optimization
Abstract: Text-to-Image (T2I) models often reproduce societal stereotypes, particularly in their depictions of people in occupational roles. By reinforcing these harmful associations, such models contribute to an unfair and imbalanced portrayal of professions. We introduce the Diverse Job Prompt Optimizer (DJ-PO), a prototype that uses a reward model to score prompts on their likeliness of generating stereotypical outputs (images) and a Large Language Model to optimize prompts that generate stereotypical output. A dataset was created from 1,200 human rankings of images generated from “diverse”, “neutral”, and “stereotypical” prompts, then synthetically augmented to 10,800 data points. Using the collected human rankings, we trained a SentenceTransformer-based reward model with a listwise ranking objective to predict the stereotypicality score of each prompt. Human evaluation confirmed that images generated using the DJ-PO prototype were rated as less stereotypical than those generated by calling the T2I model directly. The DJ-PO framework demonstrates that a text-only feedback loop provides a viable and resource efficient method for correcting existing bias in T2I models by automated prompt optimization using human ratings of stereotypicality without requiring T2I model retraining or fine-tuning of large T2I models. DJ-PO provides a practical step toward more ethical, inclusive and aligned AI-generated content.
Serve As Reviewer: ~Sietske_Tacoma1, ~Tina_Mioch1, ~Marieke_M._M._Peeters1
Submission Number: 48
Loading