Handling ambiguous annotations for facial expression recognition in the wild

Published: 01 Jan 2021, Last Modified: 07 Nov 2024ICVGIP 2021EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Annotation ambiguity due to subjectivity of annotators, crowd-sourcing, inter-class similarity and poor quality of facial expression images has been a key challenge towards robust Facial Expression Recognition (FER). Recent deep learning (DL) solutions for this problem select clean samples for training by using two or more networks simultaneously. Based on the observation that wrongly annotated samples have inconsistent predictions compared to clean samples when transformed using different augmentations, we propose a simple and effective single network FER framework robust to noisy annotations. Specifically, we qualify an image to be clean (correctly labeled) if the Jenson-Shannon (JS) divergence between its ground truth distribution and the predicted distribution for its weak augmented version is smaller than a threshold. The threshold is dynamically tuned. The qualified clean samples facilitate supervision during training. Further, to learn hard samples (correctly labeled but difficult to classify), we enforce consistency between the predicted distributions of weak and strong augmented versions of every training image through a consistency loss. Comprehensive experiments on FER datasets like RAFDB, FERPlus, curated FEC and AffectNet in the presence of both synthetic and real noisy annotation settings demonstrate the robustness of the proposed method. The source codes are publicly available at https://github.com/1980x/HandlingAmbigiousFERAnnotations.
Loading