BELIEFs: Bias-resilient, Multifaceted Evaluation of Language Models in Factual Knowledge Understanding

BELIEFs: Bias-resilient, Multifaceted Evaluation of Language Models in Factual Knowledge Understanding

ACL ARR 2024 April Submission793 Authors

16 Apr 2024 (modified: 23 May 2024)ACL ARR 2024 April SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The fill-in-the-blank prompts are widely used to evaluate how well pre-trained language models (PLMs) capture real-world factual knowledge. However, the prompt-based evaluation results vary significantly depending on the linguistic expressions of the prompts, even for the same knowledge. To assess PLMs' capability to understand facts more fairly, we introduce a new dataset called MyriadLAMA, along with the evaluation benchmarks BELIEF and its variant BELIEF-ICL to evaluate encoder- and decoder-based PLMs, respectively. MyriadLAMA presents diverse fill-in-the-blank prompts for the same fact, leveraged by BELIEFs not only to mitigate prompt bias during factual knowledge probing by consolidating results from multiple prompts but also to offer a comprehensive evaluation of factual knowledge in PLMs, including accuracy, consistency and reliability. We validate the efficacy of the BELIEFs through comprehensive evaluations of encoder-based and decoder-based PLMs.

Paper Type: Long

Research Area: Interpretability and Analysis of Models for NLP

Research Area Keywords: Interpretability and Analysis of Models for NLP,Information Extraction,Language Modeling

Contribution Types: Model analysis & interpretability, Data resources

Languages Studied: English

Submission Number: 793

Loading