Abstract: Large language models (LLMs) are being tasked with increasingly open-ended, delicate, and subjective tasks. In particular, retrieval-augmented models can now answer contentious or subjective questions (e.g., "is aspartame linked to cancer") and in doing so, conditioning on arbitrary websites that vary wildly in style, format, and veracity. Importantly, information from these websites will often conflict with one another. Humans are faced with similar conflicts, and in order to come to an answer they critically evaluate the arguments, trustworthiness, and credibility of a source. In this work, we study what types of evidence current LLMs find convincing, and if they make judgements that align with human preferences. Specifically, we construct ConflictingQA, a benchmark that pairs controversial questions with a series of evidence documents that contain different facts (e.g., quantitative results), argument styles (e.g., appeals to authority), and answers (Yes or No). Using this benchmark, we perform sensitivity analyses and counterfactual experiments to explore how in-the-wild differences in text affect model judgements. We find that models overkey off the relevance of a website to the user's search query. On the other hand, the stylistic features tested tended to have little influence on model predictions.
Paper Type: long
Research Area: Interpretability and Analysis of Models for NLP
Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: English
0 Replies
Loading