CMT Id: 334
Abstract: Pose variation and subtle differences in appearance are key challenges to fine-
grained classification. While deep networks have markedly improved general
recognition, many approaches to fine-grained recognition rely on anchoring net-
works to parts for better accuracy. Identifying parts to find correspondence dis-
counts pose variation so that features can be tuned to appearance. To this end
previous methods have examined how to find parts and extract pose-normalized
features. These methods have generally separated fine-grained recognition into
stages which first localize parts using hand-engineered and coarsely-localized pro-
posal features, and then separately learn deep descriptors centered on inferred part
positions. We unify these steps in an end-to-end trainable network supervised by
keypoint locations and class labels that localizes parts by a fully convolutional
network to focus the learning of feature representations for the fine-grained clas-
sification task. Experiments on the popular CUB200 dataset show that our method
is state-of-the-art and suggest a continuing role for strong supervision.
Conflicts: eecs.berkeley.edu, snapchat.com
0 Replies
Loading