The Gaps between Pre-train and Downstream Settings in \\Bias Evaluation and DebiasingDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: The output tendencies of PLMs vary markedly before and after FT due to the updates to the model parameters.These divergences in output tendencies result in a gap in the social biases of PLMs.For example, there exists a low correlation between the intrinsic bias scores of a PLM and its extrinsic bias scores under FT-based debiasing methods.Additionally, applying FT-based debiasing methods to a PLM leads to a decline in performance in downstream tasks.On the other hand, PLMs trained on large datasets can learn without parameter updates via ICL using prompts.ICL induces smaller changes to PLMs compared to FT-based debiasing methods.Therefore, we hypothesize that the gap observed in pre-trained and FT models does not hold true for debiasing methods that use ICL.In this study, we demonstrate that ICL-based debiasing methods show a higher correlation between intrinsic and extrinsic bias scores compared to FT-based methods.Moreover, the performance degradation due to debiasing is also lower in the ICL case compared to that in the FT case.
Paper Type: short
Research Area: Ethics, Bias, and Fairness
Contribution Types: NLP engineering experiment
Languages Studied: English
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview