Questioning the Survey Responses of Large Language Models

Ricardo Dominguez-Olmedo; Moritz Hardt; Celestine Mendler-Dünner

Questioning the Survey Responses of Large Language Models

Ricardo Dominguez-Olmedo, Moritz Hardt, Celestine Mendler-Dünner

Published: 25 Sept 2024, Last Modified: 06 Nov 2024NeurIPS 2024 oralEveryoneRevisionsBibTeXCC BY-NC-ND 4.0

Keywords: large language models, surveys

Abstract: Surveys have recently gained popularity as a tool to study large language models. By comparing models’ survey responses to those of different human reference populations, researchers aim to infer the demographics, political opinions, or values best represented by current language models. In this work, we critically examine language models' survey responses on the basis of the well-established American Community Survey by the U.S. Census Bureau. Evaluating 43 different language models using de-facto standard prompting methodologies, we establish two dominant patterns. First, models' responses are governed by ordering and labeling biases, for example, towards survey responses labeled with the letter “A”. Second, when adjusting for these systematic biases through randomized answer ordering, models across the board trend towards uniformly random survey responses, irrespective of model size or training data. As a result, models consistently appear to better represent subgroups whose aggregate statistics are closest to uniform for the survey under consideration, leading to potentially misguided conclusions about model alignment.

Supplementary Material: zip

Primary Area: Natural language processing

Submission Number: 16093

Loading