Abstract: The automatic detection of the activation of facial muscles, i.e. the detection of the so called facial Action Units (AUs), has received significant attention due to the application of facial expression analysis/recognition in areas such as affect recognition or behavior analysis. However, the recognition of subtle expressions is a challenging task that requires a multimodal approach where several sources of information are used. In this paper, we follow such an approach and propose a novel Deep Learning architecture that fuses information from several specialized Deep Neural Networks (DNNs) each of which models a different aspect of the problem in question. At the core of our approach is a novel dynamic adaptation of the Deep Network cost function so as to deal with the data imbalances that are inherent in multilabel classification problems - this allows crossdatabase training.We show the benefits of the proposed training approach and how different architectures are more suitable for particular AUs. Extensive experimental results show that our multi-modal approach outperform the state of the art by a considerable margin.
0 Replies
Loading