Trusted Source Alignment in Large Language Models

Anonymous

Trusted Source Alignment in Large Language Models

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone

TL;DR: We evaluate trusted source alignment, a model's ability to align with trusted publishers in the face of controversy, using fact checking articles.

Abstract: Large language models (LLMs) are trained on web-scale corpora that inevitably include contradictory factual information from sources of varying reliability. In this paper, we propose measuring an LLM property called trusted source alignment (TSA): the model's propensity to align with content produced by trusted publishers in the face of uncertainty or controversy. We present FactCheckQA, a TSA evaluation dataset based on a corpus of fact checking articles. We describe a simple protocol for evaluating TSA and offer a detailed analysis of design considerations including response extraction, accounting for model uncertainty, and bias in prompt formulation. We present the evaluation results for models from GPT, PaLM 2, and Falcon families, analyzing how the scores vary over time and model size.

Paper Type: long

Research Area: Resources and Evaluation

Contribution Types: Model analysis & interpretability, Data resources

Languages Studied: English

0 Replies

Loading