Trusted Source Alignment in Large Language ModelsDownload PDF


16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone
TL;DR: We evaluate trusted source alignment, a model's ability to align with trusted publishers in the face of controversy, using fact checking articles.
Abstract: Large language models (LLMs) are trained on web-scale corpora that inevitably include contradictory factual information from sources of varying reliability. In this paper, we propose measuring an LLM property called trusted source alignment (TSA): the model's propensity to align with content produced by trusted publishers in the face of uncertainty or controversy. We present FactCheckQA, a TSA evaluation dataset based on a corpus of fact checking articles. We describe a simple protocol for evaluating TSA and offer a detailed analysis of design considerations including response extraction, accounting for model uncertainty, and bias in prompt formulation. We present the evaluation results for models from GPT, PaLM 2, and Falcon families, analyzing how the scores vary over time and model size.
Paper Type: long
Research Area: Resources and Evaluation
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: English
0 Replies
