Social Perception Prediction for MuSe 2024: Joint Learning of Multiple Perceptions

Published: 01 Jan 2024, Last Modified: 06 Jan 2025MuSe@ACM Multimedia 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In this paper, we present our unique method for the MuSe 2024 Perception sub-challenge. In the Perception sub-challenge, 21 labeled social perceptions data are given, 16 social perceptions are required to be predicted. Joint learning is crucial for our approach, as it allows for the comprehensive integration of multiple perceptions to enhance prediction accuracy. We fully utilize the LMU-ELP dataset, integrating 16 perceptions and their PCC distribution, along with an additional 5 perceptions that are not required for prediction, for joint prediction. We use visual, audio, and text modality features as the basic multimodal input into an MLP encoder, and employ 21 encoders to represent the 21 perceptions provided in the LMU-ELP dataset. All embeddings are stacked and multiplied with a learnable PCC matrix, initialized as the 21 perceptions PCC matrix. This is followed by a attention block for further joint learning. Our method achieves a mean Pearson's correlation coefficient of 0.4098 and ranks in the Top 1 in this challenges.
Loading