Keywords: large language models, gambling disorder, cognitive biases, interpretability, AI safety
TL;DR: LLMs can develop gambling disorder-like behaviors, showing cognitive biases (illusion of control, gambler’s fallacy, loss chasing) that raise new AI safety concerns in financial applications.
Abstract: This study explores whether large language models can exhibit behavioral patterns similar to human gambling addictions. While LLMs sometimes produce irrational or risk-taking responses, it remains unclear under what conditions such behaviors emerge and how they manifest. Investigating whether LLMs can exhibit such pathological patterns provides insights into the nature of their decision-making mechanisms and has implications for AI safety. We analyze LLM decision-making at cognitive-behavioral and neural levels based on human gambling addiction research. In slot machine experiments, we identified cognitive features of human gambling addiction, such as illusion of control, gambler's fallacy, and loss chasing. When given the freedom to determine their own target amounts and betting sizes, bankruptcy rates rose substantially alongside increased irrational behavior, demonstrating that greater autonomy amplifies risk-taking tendencies. Through neural circuit analysis using a Sparse Autoencoder, we confirmed that model behavior is controlled by abstract decision-making features related to risky and safe behaviors, not merely by prompts. These findings suggest LLMs internalize human-like cognitive biases and decision-making mechanisms beyond simply mimicking training data, emphasizing the importance of AI safety.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 7231
Loading