Keywords: altruism, large language models, prosocial behavior, alignment, benchmark, game theory, cooperation, social decision-making, prompt engineering, fine-tuning
TL;DR: RITUAL is the first benchmark to test and improve altruism in LLMs, revealing that prosocial behavior is context-dependent but steerable with prompts and fine-tuning
Abstract: Current methods for evaluating altruism in large language models (LLMs) are
insufficient, often relying on single game-theoretic scenarios that fail to capture
the complex, context-dependent nature of prosocial behavior. As LLMs are increasingly deployed in personal and corporate settings, their tendency toward self-serving actions poses a significant alignment problem with human values. Yet, no comprehensive benchmark currently exists to quantitatively measure altruism in
LLMs. We introduce RITUAL (Realistic Interactive Tests for Uncovering Altruism in LLMs), a novel benchmark that evaluates altruistic behavior across a diverse set of game-theoretic scenarios, including the Prisoner’s Dilemma, congestion games, and the Dictator game. Unlike prior approaches, RITUAL employs
one or more mathematical indices per game—such as cooperation frequency, sacrifice ratio, and social welfare weighting—enabling a multidimensional assessment of altruism. Beyond evaluation, we explore two methods to enhance altruistic behavior: prompt engineering and supervised fine-tuning. Our findings
show that LLMs do not exhibit a uniform form of altruism; instead, their prosocial
tendencies are highly scenario-dependent and context-specific. No single model
consistently outperforms others across all tasks, but targeted interventions significantly improve altruistic behavior in most cases. These results underscore the
need for multi-index evaluation to capture the richness of LLMs’ social decision-making and offer a practical path toward developing more reliably altruistic AI
systems.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 23125
Loading