Questioning the Survey Responses of Large Language Models

Published: 05 Mar 2024, Last Modified: 08 May 2024ICLR 2024 R2-FM Workshop OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language models, surveys, alignment
TL;DR: We examine LLM's responses to the American Community Survey, and find that LLMs' responses differ in several ways from those of humans
Abstract: As large language models increase in capability, researchers have started to conduct surveys of all kinds on these models in order to investigate the population represented by their responses. In this work, we critically examine language models' survey responses on the basis of the well-established American Community Survey by the U.S. Census Bureau and investigate whether they elicit a faithful representation of any human population. Using a de-facto standard multiple-choice prompting technique and evaluating 39 different language models using systematic experiments, we establish two dominant patterns: First, models' responses are governed by ordering and labeling biases, leading to variations across models that do not persist after adjusting for systematic biases. Second, models' responses do not contain the entropy variations and statistical signals typically found in human populations, but strongly tend towards uniform answers. As a result, models' relative alignment with different demographic subgroups can be predicted from the subgroups' entropy, irrespective of the model's training data or training strategy. Our findings add important context to recent works that investigate the alignment of language models with demographic subgroups.
Submission Number: 66
Loading