Assessing Behavioral Effects of Reasoning (or the lack of) in LLMs

ARTHUR BUZELIN; Victoria Estanislau; Samira Malaquias; Yan Aquino; Pedro Augusto Torres Bento; Lucas Dayrell; Arthur Chagas; Gisele L. Pappa; Wagner Meira Jr.

Assessing Behavioral Effects of Reasoning (or the lack of) in LLMs

ARTHUR BUZELIN, Victoria Estanislau, Samira Malaquias, Yan Aquino, Pedro Augusto Torres Bento, Lucas Dayrell, Arthur Chagas, Gisele L. Pappa, Wagner Meira Jr.

Published: 23 Sept 2025, Last Modified: 23 Sept 2025CogInterp @ NeurIPS 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models (LLMs), Moral Judgment, Attention Mechanisms, AI Alignment, Ethical Dilemmas

Abstract: We study how large language models (LLMs) differ in moral judgment when prompted for fast, intuition-like answers versus explicit reasoning. Across taboo dilemmas, trolley problems, and AI principle-conflict scenarios, non-reasoning models align more closely with human intuitions, while reasoning-enabled models tend to favor consequentialist or rule-based choices, sometimes overriding autonomy and privacy. Our findings make two contributions: (i) a controlled evaluation framework for isolating the behavioral effects of reasoning in LLMs, and (ii) empirical evidence that reasoning capabilities can induce normative shifts misaligned with human values. These results highlight a structural tension in model alignment: as LLMs become more capable of reasoning, they may not converge toward human-like ethics, but instead follow paths that abstract away moral intuitions. This raises critical questions for the design of safe and aligned artificial general intelligence.

Submission Number: 96

Loading