Keywords: pluralistic alignment, preference learning, internal state, affective computing, preference aggregation, personalization
TL;DR: Human preferences are affected by their (transient) internal state, so this state should be considered when aligning AI.
Abstract: Every reader is constantly changing; the same text may be received differently by the same person across affective states, attentional contexts, and frames of reference. Current alignment work recognizes the importance of pluralistic perspectives across individuals and groups, yet often treats interpretation as stable within an individual. We argue for a finer unit of alignment: internal state. Drawing from cognitive psychology, we conduct studies with language-models-as-annotator to show that distinct affective states produce divergent preferences obscured by aggregation. We find that standard inter-annotator agreement diagnostics cannot distinguish this structured divergence from random noise. We discuss implications for preference data collection, downstream applications, and the study of how internal states shape miscommunication.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 134
Loading