Keywords: Pluralsitic Alignment, Computational Social Science and Cultural Analytics, Interpretability and Analysis of Models for NLP, Language Modeling, Question Answering
TL;DR: Large Language Models are reasonably consistent over value-laden questions, although some inconsistencies remain.
Abstract: Large language models (LLMs) appear to bias their survey answers toward certain values. Nonetheless, some argue that LLMs are too inconsistent to simulate particular values. Are they? To answer, we first define value consistency as the similarity of answers across 1) \textit{paraphrases} of one question, 2) related questions under one \textit{topic}, 3) multiple-choice and open-ended \textit{use-cases} of one question, and 4) \textit{multilingual} translations of a question to English, Chinese, German, and Japanese. We apply these measures to a few large ($>=34b$), open LLMs including \texttt{llama-3}, as well as \texttt{gpt-4o}, using eight thousand questions spanning more than 300 topics. Unlike prior work, we find that \textit{models are relatively consistent} across paraphrases, use-cases, translations, and within a topic. Still, some inconsistencies remain. Base models are both more consistent compared to fine-tuned models and are uniform in their consistency across topics, while fine-tuned models are more inconsistent about some topics (e.g. "euthanasia") than others (e.g. "Women's rights") like our human participants.
Submission Number: 4
Loading