Keywords: language models, values, AI ethics, AI values, empirical analysis, human-AI interaction, value alignment, privacy-preserving analysis, value pluralism, AI and society
TL;DR: Our privacy-preserving analysis of values in real-world language model interactions reveals a novel taxonomy of AI values that differs from human frameworks, is highly context-dependent, and becomes most explicit/legible during moments of resistance.
Abstract: AI assistants interact with millions of real users everyday, imparting normative judgments that can have significant personal and societal impact—but little is known about what values guide these interactions in practice. To address this, we develop a method to empirically analyze values expressed in hundreds of thousands of real-world conversations with Claude models. We empirically discover and taxonomize 3,308 AI values, and study how model values and responses depend on context. We find that Claude expresses many professional and intellectual values, and typically supports prosocial human values while resisting values like "moral nihilism." While some values appear consistently (e.g. "professionalism"), most are highly context-dependent—"harm prevention" emerges when the model resists users, "historical accuracy" when discussing controversial events, "healthy boundaries" in relationship advice, and "human agency" in technology ethics discussions. By providing the first large-scale empirical mapping of AI values in deployment, this work creates a foundation for more grounded evaluation and design of values in increasingly influential AI systems.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the COLM Code of Ethics on https://colmweb.org/CoE.html
Author Guide: I certify that this submission complies with the submission instructions as described on https://colmweb.org/AuthorGuide.html
Submission Number: 1102
Loading