Superficial Alignment, Subtle Divergence, and Nudge Sensitivity in LLM Decision-Making

Manuel Cherep; Nikhil Singh; Patricia Maes

Superficial Alignment, Subtle Divergence, and Nudge Sensitivity in LLM Decision-Making

Manuel Cherep, Nikhil Singh, Patricia Maes

Published: 10 Oct 2024, Last Modified: 31 Oct 2024NeurIPS 2024 Workshop on Behavioral MLEveryoneRevisionsBibTeXCC BY 4.0

Keywords: llm, agent, decision-making, nudge, alignment

Abstract: LLMs are being set loose in complex, real-world environments involving sequential decision-making and tool use. Often, this involves making choices on behalf of human users. Not much is known about the distribution of such choices, and how susceptible they are to different choice architectures. We perform a case study with a few such LLM models on a multi-attribute tabular decision-making problem, under the canonical default option nudge and additional prompting strategies. We show that, despite superficial similarities to human choice distributions, such models differ in subtle but important ways. First, they show much higher susceptibility to the default option nudge. Second, they diverge in points earned, being affected by factors like the idiosyncrasy of available prizes. Third, they diverge in information acquisition strategies: e.g. incurring substantial cost to reveal too much information, or selecting without revealing any. Finally, we show that simple prompt nudges like self-explanations can shift the choice distribution, and few-shot prompting with human data can induce greater alignment. These findings suggest that more information is needed before deploying models as agents or assistants acting on behalf of users in complex environments.

Submission Number: 58

Loading