Keywords: Ethics, Bias, and Fairness
Abstract: While Text-to-Image (T2I) diffusion models have achieved remarkable synthesis quality, these models may inherit or even amplify biases in training data. Recent debiasing methods have achieved notable progress in mitigating such unintentional biases. However, a largely overlooked threat is the intentional injection of bias via backdoor attacks. Especially in non-English settings such as Chinese, this threat is underexplored. In this paper, we show that English-centric backdoors transfer poorly to Chinese T2I models due to tokenization and logographic-script differences. To bridge this gap, we conduct the first systematic study of Character-level Bias Backdoor Attack (CBBA) tailored for the Chinese linguistic landscape. CBBA introduces three stealthy trigger strategies---quotation embedding, traditional character conversion, and invisible Unicode injection---that exploit Chinese-specific orthographic variants and encoding quirks to evade detection. These triggers are embedded via a novel cross-modal alignment mechanism that enforces a strong association between the trigger and the target bias while preserving semantic consistency for benign inputs. Extensive experiments on mainstream T2I models demonstrate that CBBA achieves an Attack Success Rate (ASR) exceeding 80% (at a 20% poisoning rate) while maintaining near-perfect utility. Furthermore, CBBA exhibits superior robustness against state-of-the-art defenses, maintaining 2 to 4 times higher ASR than baseline attacks.
Paper Type: Long
Research Area: Ethics, Bias, and Fairness
Research Area Keywords: model bias/fairness evaluation
Languages Studied: Chinese, English
Submission Number: 9147
Loading