Abstract: In this paper, we propose a large training dataset named
Celeb-500K for face recognition, which contains 50M images
from 500K persons. To better facilitate academic research,
we clean Celeb-500K to obtain Celeb-500K-2R, which contains 25M aligned face images from 365K persons. Based on
the developed dataset, we achieve state-of-the-art face recognition performance and reveal two important observations on
face recognition study. First, metric learning methods have
limited performance gain when the training dataset contains
a large number of identities. Second, in order to develop an
efficient training dataset, the number of identities is more important than the average image number of each identity from
the perspective of face recognition performance. Extensive
experimental results show the superiority of Celeb-500K and
provide a strong support to the two observations.
0 Replies
Loading