Fine-grained pose prediction, normalization, and recognition

Ning Zhang, Evan Shelhamer, Yang Gao, Trevor Darrell

Feb 24, 2016 (modified: Feb 24, 2016) ICLR 2016 workshop submission readers: everyone
  • CMT id: 334
  • Abstract: Pose variation and subtle differences in appearance are key challenges to fine- grained classification. While deep networks have markedly improved general recognition, many approaches to fine-grained recognition rely on anchoring net- works to parts for better accuracy. Identifying parts to find correspondence dis- counts pose variation so that features can be tuned to appearance. To this end previous methods have examined how to find parts and extract pose-normalized features. These methods have generally separated fine-grained recognition into stages which first localize parts using hand-engineered and coarsely-localized pro- posal features, and then separately learn deep descriptors centered on inferred part positions. We unify these steps in an end-to-end trainable network supervised by keypoint locations and class labels that localizes parts by a fully convolutional network to focus the learning of feature representations for the fine-grained clas- sification task. Experiments on the popular CUB200 dataset show that our method is state-of-the-art and suggest a continuing role for strong supervision.
  • Conflicts:,