Keywords: Bias Quantification, Model Evaluation, Prompt-Tuning, Language Models
TL;DR: We presented a novel and reproducible way to evaluate the social bias in language models through soft prompt-tuning.
Abstract: Prompting large language models (LLMs) has gained substantial popularity as pre-trained LLMs are capable of performing downstream tasks without requiring large quantities of labelled data. It is, therefore, natural that prompting is also used to evaluate biases exhibited by these models. However, achieving good task-specific performance often requires manual prompt optimization. In this paper, we explore the use of soft-prompt tuning to quantify the biases of LLMs such as OPT and LLaMA. These models are trained on real-world data with potential implicit biases toward certain groups. Since LLMs are increasingly used across many industries and applications, it is crucial to accurately and efficiently identify such biases and their practical implications.
In this paper, we use soft-prompt tuning to evaluate model bias across several sensitive attributes through the lens of group fairness (bias). In addition to improved task performance, using soft-prompt tuning provides the advantage of avoiding potential injection of human bias through manually designed prompts. Probing with prompt-tuning reveals important bias patterns, including disparities across age and sexuality.
Submission Number: 90
Loading