Abstract: Large Language Models (LLMs) are increasingly utilized in autonomous decision-making, where they sample options from vast action spaces.
However, the heuristics that guide this sampling process remain under-explored.
We study this sampling behavior and show that this underlying heuristics resembles that of human decision-making: comprising a descriptive component (reflecting statistical norm) and a prescriptive component (implicit ideal encoded in the LLM) of a concept.
We show that this deviation of a sample from the statistical norm towards a prescriptive component consistently appears in concepts across diverse real-world domains like public health, and economic trends.
To further illustrate the theory, we demonstrate that concept prototypes in LLMs are affected by prescriptive norms, similar to the concept of normality in humans.
Through case studies and comparison with human studies, we illustrate that in real-world applications, the shift of samples toward an ideal value in LLMs' outputs can result in significantly biased decision-making, raising ethical concerns.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: Data influence , Hierarchical & concept explanations, Human-subject application-grounded evaluations, Probing
Contribution Types: Model analysis & interpretability, Theory
Languages Studied: English
Submission Number: 7472
Loading