Assessment and manipulation of latent constructs in pre-trained language models using psychometric scales

Anonymous

Assessment and manipulation of latent constructs in pre-trained language models using psychometric scales

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone

Abstract: Recent discoveries suggest that large language models demonstrate personality-like traits. This evidence suggests that known and yet undiscovered biases of language models conform to standard human-like latent psychological constructs. While large conversational models may be tricked into genuinely answering questionnaires, psychometric assessment methods are lacking for thousands of simpler transformers trained for other tasks. This article teaches how to reformulate psychological questionnaires into natural language inference prompts and provides a code library to support the psychometric assessment of arbitrary models. Experiments performed with a sample of 88 publicly available models demonstrate the existence of mental health-related constructs, such as anxiety, depression, and the sense of coherence. Extensive validation of the constructs reveals that they conform with standard theories in human psychology, including known correlations, and mitigation strategies. The ability to interpret and rectify the performance of language models using psychological tools will help to develop more explainable, controllable, and trustworthy models.

Paper Type: long

Research Area: Interpretability and Analysis of Models for NLP

Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models, Data resources

Languages Studied: English

0 Replies

Loading