Assessment and manipulation of latent constructs in pre-trained language models using psychometric scalesDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: Recent discoveries suggest that large language models demonstrate personality-like traits. This evidence suggests that known and yet undiscovered biases of language models conform to standard human-like latent psychological constructs. While large conversational models may be tricked into genuinely answering questionnaires, psychometric assessment methods are lacking for thousands of simpler transformers trained for other tasks. This article teaches how to reformulate psychological questionnaires into natural language inference prompts and provides a code library to support the psychometric assessment of arbitrary models. Experiments performed with a sample of 88 publicly available models demonstrate the existence of mental health-related constructs, such as anxiety, depression, and the sense of coherence. Extensive validation of the constructs reveals that they conform with standard theories in human psychology, including known correlations, and mitigation strategies. The ability to interpret and rectify the performance of language models using psychological tools will help to develop more explainable, controllable, and trustworthy models.
Paper Type: long
Research Area: Interpretability and Analysis of Models for NLP
Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models, Data resources
Languages Studied: English
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview