Abstract: CelebA is the most common and largest scale dataset used to evaluate methods for facial attribute prediction, an important benchmark in imbalanced classification and face analysis. However, we argue that the evaluation metrics and baseline models currently used to compare the performance of different methods are insufficient for determining which approaches are best at classifying highly imbalanced attributes. We are able to obtain results comparable to current state-of-the-art using a ResNet-18 model trained with binary cross-entropy, a substantially less sophisticated approach than related work. We also show that we can obtain near-state-of-the-art results on accuracy using a model trained with just 10% of CelebA, and on balanced accuracy simply by maximizing recall for imbalanced attributes at the expense of all other metrics. To deal with these issues, we suggest several improvements to model evaluation including better metrics, stronger baselines, and increased awareness of the limitations of the dataset.
0 Replies
Loading