Can Large Language Models Understand You Better? An MBTI Personality Detection Dataset Aligned with Population Traits

Published: 01 Jan 2025, Last Modified: 15 May 2025COLING 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The Myers-Briggs Type Indicator (MBTI) is one of the most influential personality theories reflecting individual differences in thinking, feeling, and behaving. MBTI personality detection has garnered considerable research interest and has evolved significantly over the years. However, this task tends to be overly optimistic, as it currently does not align well with the natural distribution of population personality traits. Specifically, the self-reported labels in existing datasets result in data quality issues and the hard labels fail to capture the full range of population personality distributions. In this paper, we identify the task by constructing MBTIBench, the first manually annotated MBTI personality detection dataset with soft labels, under the guidance of psychologists. Our experimental results confirm that soft labels can provide more benefits to other psychological tasks than hard labels. We highlight the polarized predictions and biases in LLMs as key directions for future research.
Loading