Investigating the Applicability of Self-Assessment Tests for Personality Measurement of Large Language Models
Abstract: As large language models (LLM) evolve in their capabilities, various recent studies have tried to quantify their behavior using psychological tools created to study human behavior. One such example is the measurement of "personality" of LLMs using personality self-assessment tests developed to measure human personality. Yet almost none of these works verify the applicability of these tests on LLMs. In this paper, we take three such studies on personality measurement of LLMs. We use the prompts used in these three different papers to measure the personality of the same LLM. We find that all three prompts lead very different personality scores, a difference that is statistically significant for all traits in a majority of scenarios. We then introduce the property of option order symmetry for personality measurement of LLMs. Since most of the self-assessment tests exist in the form of multiple choice question (MCQ) questions, we argue that the scores should also be robust to not just the prompt template but also the order in which the options are presented. This test unsurprisingly reveals that the answers to the self-assessment tests are not robust to the order of the options. These simple tests, done on ChatGPT and Llama2 models show that self-assessment personality tests created for humans are not reliable measures for personality in LLMs and their applicability cannot be take for granted.
Paper Type: long
Research Area: Interpretability and Analysis of Models for NLP
Contribution Types: Model analysis & interpretability
Languages Studied: English
Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.
0 Replies
Loading