Capturing Prior Knowledge in Soft Labels for Classification with Limited or Imbalanced Data

Zhehao Zhong, Shen Zhao, Ruixuan Wang

2022 (modified: 17 Nov 2022)PRCV (2) 2022Readers: Everyone

Abstract: Successful applications of deep learning often depend on large amount of training data. However, in practical image recognition tasks, available training data are often limited or imbalanced across classes, causing the over-fitting issue or the prediction bias issue during model training. In this paper, based on word embedding models from studies in natural language processing, the prior knowledge about the relationships between image classes is utilized to help train more generalizable classifiers under the condition of limited or class-imbalanced training data. Such inter-class relational knowledge is captured in the word embedding vectors for the textual names of image classes. Using these word embedding vectors as soft labels for corresponding image classes, the feature extractor part of a deep learning model can be guided to learn to extract visual features which contain both class-specific and class-shared information. Experiments on multiple image classification datasets confirm that the proposed learning framework helps improve model performance when training data is limited or class-imbalanced.

0 Replies