HEART: Emotionally-driven test-time scaling of Language Models

Gabriela Pinto; Palash Goyal; Yiwen Song; Souradip Chakraborty; Tomas Pfister; Hamid Palangi

HEART: Emotionally-driven test-time scaling of Language Models

Gabriela Pinto, Palash Goyal, Yiwen Song, Souradip Chakraborty, Tomas Pfister, Hamid Palangi

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Prompt Engineering, Reasoning, Affective Computing, Iterative Refinement

Abstract: Test-time scaling has shown considerable success in improving the performance of language models on complex reasoning tasks without requiring fine-tuning. However, current strategies such as self-reflection primarily focus on logical or structural refinement and do not leverage the guiding potential of affective feedback. Inspired by psychological research showing that emotions modulate cognitive performance, we introduce \textit{HEART}--a novel framework that uses emotionally-driven prompts for iterative self-correction. \textit{HEART} provides feedback using a curated set of concise, emotionally charged phrases based on the six universal emotions categorized by Dr. Paul Ekman. By systematically varying the emotional tone of the feedback across iterations, our method guides the model to escape flawed reasoning paths and explore more promising alternatives. We evaluate our framework on challenging reasoning benchmarks including OlympiadBench, Humanity's Last Exam, SimpleQA, and GPQA Diamond demonstrating robustness across diverse benchmarks. Our results reveal a significant new phenomenon: when deployed in a simulated Human-in-the-Loop (HITL) setting, this affective iteration protocol unlocks significantly deeper reasoning, leading to consistent and substantial increases in accuracy over affect-sterile baselines. This comparative analysis identifies a key bottleneck for autonomous deployment. While \textit{HEART} successfully generates superior reasoning paths, our autonomous results indicate that performance is currently limited by the generative synthesis mechanism rather than reasoning generation. This finding precisely pinpoints a new, critical research direction for the field, shifting the challenge from pure reasoning generation to autonomous reasoning synthesis. Our findings suggest that the next frontier in machine reasoning may lie not just in refining logic, but also in understanding and leveraging the ``\textit{HEART}'' of the models.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 20405

Loading