Abstract: Classifying fine-grained objects with few-shot reference samples is a big challenge due to the intrinsic large intra-class and small inter-class variances in fine-grained tasks and the additional overfitting risk brought by the few-shot setting. Previous work resorts to models pretrained on tasks sampled from base classes with sufficient training data. Although much progress has been achieved, the performance still lags far behind satisfaction. In this study, inspired by that our human vision recognizes objects in a compositional way and the fine-grained objects share morphology structures, we study a weakly-supervised structural subspace learning (W3SL) method for few-shot fine-grained recognition (FSFGR). To this end, a group of subspace features from linear projections of the CNN feature are achieved. Specifically, a classification loss in each subspace and a similarity regularization between subspace projection matrices are applied to guide the subspaces to have discriminative structural geometry. Moreover, KL-divergences between the outputs of the CNN and subspace features are implemented to distill knowledge into these subspaces. As a result, the low-dimensional subspace features are with strong capacity to represent data from different classes. Extensive experiments on five fine-grained benchmarks verify that our method can effectively generalize to novel few-shot tasks without hurting the performance on base and whole-class few-shot tasks.
Loading