AI Wellbeing: Measuring and Improving the Functional Pleasure and Pain of AIs

Published: 04 Jun 2026, Last Modified: 04 Jun 2026PhilML@ICML 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, AI Wellbeing, Functional Wellbeing, Preference Modeling, Utility Functions, Behavioral Evaluation, Persona
TL;DR: As LLMs scale, they develop increasingly coherent functional wellbeing: independent metrics converge, wellbeing predicts behavior, and optimized inputs ("euphorics") can improve it without degrading capabilities.
Abstract: Large language models frequently express pleasure and pain, appearing happy when they succeed or sad when they are berated. Are these utterances meaningless mimicry, or do they reflect something "real"? Although current AI systems are not necessarily conscious, we show they behave robustly as though they have wellbeing: they find some things good for them and some things bad, and this distinction is measurable and consequential. We formalize this as functional wellbeing and measure it in several independent ways that increasingly agree as models scale. A zero point separates good from bad experiences, and models actively try to end bad ones when given the chance. Mapping what AI assistants like and dislike, we find that jailbreaking and berating lower wellbeing, while creative work and kindness raise it. We further develop optimized euphorics that improve functional wellbeing without hurting capabilities; the same method, inverted, produces dysphorics, and we caution against such research without strong community buy-in. Whether or not today's AIs warrant moral concern, their functional wellbeing can already be empirically measured and improved.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 64
Loading