Keywords: Embedding, Psychology, Computational Social Science
Abstract: Pre-trained transformer-based language models revolutionized natural language processing and are increasingly important in computational psychology. However, these models' representations are optimized for quantifying \textit{semantic} similarity, which may not always align with \textit{psychological} similarity. We present \textit{PsyEmbedding}, a framework for fine-tuning models (including BERT, SBERT, RoBERTa, GTE, and E5) to encode psychological constructs within content rather than just their semantic meaning. Leveraging a dataset annotated for numerous psychological constructs (CAMEL), we introduce a balanced stratified sampling strategy to generate embeddings predictive of psychological dimensions. We evaluate PsyEmbedding across multiple textual similarity and construct representation tasks, demonstrating that our method significantly aligns embedding spaces with psychological theory.
Paper Type: Short
Research Area: Computational Social Science, Cultural Analytics, and NLP for Social Good
Research Area Keywords: human behavior analysis, sociolinguistics, NLP tools for social analysis
Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models
Languages Studied: English
Submission Number: 6775
Loading