On the Affective Alignment of Language Models with Partisan PerspectivesDownload PDF

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone
TL;DR: We measure the affective alignment between LMs and humans.
Abstract: This study explores the alignment and steerability of language models (LMs) in generating responses that mirror human affect, including emotions and moral sentiments, in sociopolitical debates. While existing research primarily focuses on assessing positional alignment, we introduce the concept of affective alignment, emphasizing the significance of aligning emotional and moral dimensions to boost reliability and acceptance of AI-generated content. By comparing to real-world Twitter messages in COVID-19 and Roe v. Wade discussions, we assess the affective alignment of 36 LMs across diverse topics, detecting significant LM misalignment with both liberals and conservatives, which is greater than the partisan divide in the US. For instruction-tuned LMs, despite improvements through steering, misalignment with human affect still persists. This implies the critical challenge of inadvertent biases and stereotypes perpetuated by LMs from their training data. Our study underscores the necessity of understanding and improving affective alignment in LMs, paving the way for future research to enhance the emotional and moral sensitivity of LMs for broader societal benefit.
Paper Type: long
Research Area: Computational Social Science and Cultural Analytics
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview