Abstract: Automatic personality detection has evolved from simple text classification to sophisticated multimodal analyses, recognizing the multi-dimensional manifestation of personality beyond textual data. This shift highlights the need for datasets that can accurately capture the complexity of human personality through diverse modalities. We introduce the Multimedia Conversational Personality Dataset (MMPD), a large, extensive and varied dataset, built on 305 movies and 14 TV series, featuring over 46k dialogues, 552k utterances, 4016 characters, and 963 hours of video. MMPD not only addresses the challenges of existing datasets by offering majority-voted personality annotations and detailed relationship networks but also provides a new method for matching subtitles with original scripts, paving the way for advanced analyses of personality dynamics across various contexts.
Paper Type: long
Research Area: Computational Social Science and Cultural Analytics
Contribution Types: Data resources
Languages Studied: English
0 Replies
Loading