PsyEmbedding: A Framework for Aligning Language Models to Psychological Theory

PsyEmbedding: A Framework for Aligning Language Models to Psychological Theory

ACL ARR 2026 January Submission6775 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Embedding, Psychology, Computational Social Science

Abstract: Pre-trained transformer-based language models revolutionized natural language processing and are increasingly important in computational psychology. However, these models' representations are optimized for quantifying \textit{semantic} similarity, which may not always align with \textit{psychological} similarity. We present \textit{PsyEmbedding}, a framework for fine-tuning models (including BERT, SBERT, RoBERTa, GTE, and E5) to encode psychological constructs within content rather than just their semantic meaning. Leveraging a dataset annotated for numerous psychological constructs (CAMEL), we introduce a balanced stratified sampling strategy to generate embeddings predictive of psychological dimensions. We evaluate PsyEmbedding across multiple textual similarity and construct representation tasks, demonstrating that our method significantly aligns embedding spaces with psychological theory.

Paper Type: Short

Research Area: Computational Social Science, Cultural Analytics, and NLP for Social Good

Research Area Keywords: human behavior analysis, sociolinguistics, NLP tools for social analysis

Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models

Languages Studied: English

Submission Number: 6775

Loading