What Evidence Do Language Models Find Convincing?Download PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: Large language models (LLMs) are being tasked with increasingly open-ended, delicate, and subjective tasks. In particular, retrieval-augmented models can now answer contentious or subjective questions (e.g., "is aspartame linked to cancer") and in doing so, conditioning on arbitrary websites that vary wildly in style, format, and veracity. Importantly, information from these websites will often conflict with one another. Humans are faced with similar conflicts, and in order to come to an answer they critically evaluate the arguments, trustworthiness, and credibility of a source. In this work, we study what types of evidence current LLMs find convincing, and if they make judgements that align with human preferences. Specifically, we construct ConflictingQA, a benchmark that pairs controversial questions with a series of evidence documents that contain different facts (e.g., quantitative results), argument styles (e.g., appeals to authority), and answers (Yes or No). Using this benchmark, we perform sensitivity analyses and counterfactual experiments to explore how in-the-wild differences in text affect model judgements. We find that models overkey off the relevance of a website to the user's search query. On the other hand, the stylistic features tested tended to have little influence on model predictions.
Paper Type: long
Research Area: Interpretability and Analysis of Models for NLP
Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: English
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview