MotiveBench: How Far Are We From Human-Like Motivational Reasoning in Large Language Models?

ACL ARR 2024 December Submission1828 Authors

16 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract:

Large language models (LLMs) have been widely adopted as the core of agent frameworks in various scenarios, such as social simulations and AI companions. However, the extent to which they can replicate human-like motivations remains an underexplored question. Existing benchmarks are constrained by simplistic scenarios and the absence of character identities, resulting in an information asymmetry with real-world situations. To address this gap, we propose MotiveBench, which consists of 200 rich contextual scenarios and 600 reasoning tasks covering multiple levels of motivation. Using MotiveBench, we conduct extensive experiments on seven popular model families, comparing different scales and versions within each family. Our analysis reveals several notable findings, such as the difficulty LLMs face in reasoning about "love & belonging" motivations and the tendencies of LLMs toward excessive rationality and idealism. These insights highlight a promising direction for future research on the humanization of LLMs.

Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: benchmarking, evaluation
Languages Studied: English
Submission Number: 1828
Loading