Abstract: Large Language Models (LLMs) are increasingly used in human-centered applications, yet their ability to model diverse psychological constructs is not well understood. In this study, we systematically evaluate a range of Transformer-based LLMs to predict psychological variables across five major dimensions: affect, substance use, mental health, sociodemographics, and personality. Analyses span three temporal levels—short daily text responses, two-week, and user-level text collected over two years—allowing us to examine how each model’s strengths align with the underlying stability of different constructs. The findings show that mental health signals emerge as the most reliably captured dimension, possibly because people often use detailed, specific language when describing their emotional experiences, which makes these cues easier for models to detect. At the daily scale, context-rich embeddings of DeBERTa and HaRT excel at capturing short-term emotional fluctuations, whereas few-shot Llama3-8B proves particularly adept at modeling nuanced substance use behaviors at the two-week interval. Aggregating text over the entire study period yields stronger correlations for socio-demographic factors e.g., age, income. These results suggest actionable insights into the design of LLM-based approaches for psychological assessments, emphasizing the importance of selecting appropriate model architectures and temporal aggregation techniques to the stability and nature of the target construct.
Paper Type: Long
Research Area: Human-Centered NLP
Research Area Keywords: Psychological States, Psychological Dispositions, Psychological Traits, Human Behavior, Human-Centered NLP, Computational Social Science
Contribution Types: Model analysis & interpretability, Data analysis
Languages Studied: English
Submission Number: 4870
Loading