Abstract: A computer vision model in machine learning is only as good as its training data (Mose-
ley, 2024). Visual biases and spurious correlations in training data can manifest in model
decision-making, potentially leading to discrimination against sensitive groups (Mehrabi
et al., 2021). To address this issue, various methods have been proposed to automate bias
discovery and use these biases to train bias-aware computer vision models. However, one
major drawback of these methods is their lack of transparency, as the discovered biases are
often not human-interpretable. To overcome this, Kim et al. introduced a Bias-to-Text
(B2T) framework that identifies visual biases as keywords and expresses them in natural
language. This paper aims to reproduce their findings and expand upon the evaluation
methods. The central claims that the authors make are that B2T (i) can discover both
novel and known biases, (ii) can facilitate debiased training of image classifiers and (iii) can
be deployed across different classifier architectures, such as Transformer- and convolutional
neural network (CNN)-based models. We successfully reproduce their main claims and ex-
tend their findings by analyzing whether novel bias keywords discovered by B2T represent
actual biases. Additionally, we conduct further robustness experiments, leading us to con-
clude that the framework not only discovers biases in data, but also is sensitive to changes
in the underlying classification model, highlighting a future research direction. Our code is
publicly available at https://anonymous.4open.science/r/B2T-Repr-898B.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Shiguang_Shan2
Submission Number: 4317
Loading