Abstract: Face detectors are a subset of object detectors that output, at a minimum, a set of locations in an image if and where human faces are present. Face detection is challenging, in part, due to low variance in the structural content of frontal-view faces (i.e., most faces have two eyes, a nose and a mouth) and high variance in visual appearance. This aspect of the domain skews detectors to higher false positive rates as a consequence of many patches of imagery containing features spatially consistent with frontal-view faces. In this study, we evaluate the performance of three state-of-the-art face detectors (BlazeFace, MTCNN, and SCRFD) on frontal-view face imagery in a novel human-labeled dataset of 64,104 images with reliable ground truth. We show evidence that modern CNN-based models rely heavily on low-level image features, in spite of their powerful capability to learn complex, discriminatory visual features and concepts. We do this by altering the spectral and color content of frontal-view face images. To gain a better understanding of detector failures, we apply the Deep Dream technique to enhance image features that lead models to false positives.
0 Replies
Loading