Towards Quantifying Bias in Large Language Models

Ali Nosratifiroozsalari; Alireza Afzal Aghaei; Ronald H Davies; Rajiv Ramnath

Towards Quantifying Bias in Large Language Models

Ali Nosratifiroozsalari, Alireza Afzal Aghaei, Ronald H Davies, Rajiv Ramnath

Published: 16 Oct 2025, Last Modified: 10 Nov 2025NeurIPS 2025 ER WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Bias quantification, Parameter-efficient fine-tuning (PEFT), Large language models (LLMs), Model evaluation, Trustworthy AI

TL;DR: We introduce a PEFT-based method to measure bias in large language models and validate it on three different models.

Abstract: Bias and explainability are growing topics which are extremely important to help us understand large language models and how they perform. Understanding how these models work provides insights that aid in training them more efficiently and effectively, in addition to designing more factual and less ambiguous models. In this study we propose the use of parameter-efficient fine-tuning (PEFT) for measuring bias, which is both accessible and computationally affordable. We design two datasets with identical questions and contrasting young and old oriented answers. By designing experiments and analyzing them, we demonstrate the value of PEFT in measuring bias and helping us take a step in unveiling the black box nature of large language models. Our experiments across three models (Qwen 1.8B, Llama 7B, and Yi 6B) demonstrated consistent bias patterns, with models typically converging faster on the old oriented dataset, although with discernible convergence margins. Additionally, we validated the results using statistical tests to highlight the robustness of our methodology. This approach could be valuable especially for models employed in sensitive domains such as law and healthcare where consistent logical reasoning regardless of demographics is essential.

Submission Number: 161

Loading