BigFive: A Dataset of Coarse- and Fine-Grained Personality CharacteristicsDownload PDF

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: Obtaining the personalities of users conveyed by their published short texts has a wide and important range of applications, from detecting abnormal behavior of online users to accurately customization recommendation. Advancement in this area can be improved using large-scale datasets with coarse- and fine-grained typologies, adaptable to multiple downstream tasks. Therefore, this paper introduces $BigFive$, a large, high quality dataset manually annotated by experts. $BigFive$ contains 13,478 Chinese phrases that belong to five categories (coarse-grained) and 30 categories (fine-grained). The reliability of five categories grouped by personality level and 30 categories grouped by dimension level is demonstrated via a detailed data analysis. In addition, a strong baseline is build based on fine-tuning a BERT model. Our BERT-based model achieves an average F1-score of .33 (std=.24) in terms of 30 categories and an average F1-score of .66 (std=.05) in terms of five categories. The experimental results suggest that there is much room for improvement.
0 Replies

Loading