Learning with Alignments: Tackling the Inter- and Intra-domain Shifts for Cross-multidomain Facial Expression Recognition
Abstract: Facial Expression Recognition (FER) holds significant importance in human-computer interactions. Existing cross-domain FER methods often transfer knowledge solely from a single labeled source domain to an unlabeled target domain, neglecting the comprehensive information across multiple sources. Nevertheless, cross-multidomain FER (CMFER) is very challenging for (i) the inherent inter-domain shifts across multiple domains and (ii) the intra-domain shifts stemming from the ambiguous expressions and low inter-class distinctions. In this paper, we propose a novel Learning with Alignments CMFER framework, named LA-CMFER, to handle both inter- and intra-domain shifts. Specifically, LA-CMFER is constructed with a global branch and a local branch to extract features from the full images and local subtle expressions, respectively. Based on this, LA-CMFER presents a dual-level inter-domain alignment method to force the model to prioritize hard-to-align samples in knowledge transfer at a sample level while gradually generating a well-clustered feature space with the guidance of class attributes at a cluster level, thus narrowing the inter-domain shifts. To address the intra-domain shifts, LA-CMFER introduces a multi-view intra-domain alignment method with a multi-view clustering consistency constraint where a prediction similarity matrix is built to pursue consistency between the global and local views, thus refining pseudo labels and eliminating latent noise. Extensive experiments on six benchmark datasets have validated the superiority of our LA-CMFER.
Primary Subject Area: [Content] Vision and Language
Relevance To Conference: Our paper focuses on cross-multidomain facial expression recognition, closely aligning with the theme of "Emotional and Social Signals" at ACM MM. Our research aims to leverage deep learning technologies to bridge computer with human emotions by analyzing, processing, and recognizing facial expressions, providing substantial technical support for advancing deeper human-computer interaction.
Unlike conventional approaches that rely on transferring knowledge from a single labeled source domain to an unlabeled target domain, LA-CMFER capitalizes on a wealth of information from multiple sources. By introducing the innovative Learning with Alignments CMFER framework, LA-CMFER adeptly addresses both inter-domain shifts across various domains and intra-domain shifts caused by annotation uncertainty and category confusion. LA-CMFER's dual-level feature extraction approach, encompassing a global branch for capturing features from entire images and a local branch for discerning subtle expressions, bolsters the resilience of FER across diverse domains. Furthermore, LA-CMFER employs dual-level inter-domain alignment to prioritize hard-to-align samples during knowledge transfer, progressively refining the feature space with guidance from class attributes to mitigate inter-domain shifts. Additionally, LA-CMFER tackles intra-domain shifts through a multi-view intra-domain alignment method, ensuring coherence between global and local perspectives, refining pseudo labels, and eliminating latent noise. Extensive experimentation on six benchmark datasets validates LA-CMFER's superiority, underscoring its potential to propel advancements in facial expression recognition within multimedia and multimodal processing realms.
Supplementary Material: zip
Submission Number: 1746
Loading