Few-shot Incremental Learning with Textual Knowledge Embedding by Visual-language Model

Hantao Yao, Lu Yu, Changsheng Xu

Published: 2024, Last Modified: 13 Nov 2024Int. J. Softw. Informatics 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Real scenarios often face the problems of data scarcity and dynamic data changes. The purpose of few-shot incremental learning is to use a small amount of data to infer data knowledge and reduce the model's catastrophic forgetting of old knowledge. Existing few-shot incremental learning algorithms (CEC, FACT, etc.) mainly use visual features to adjust the feature encoder or classifier to achieve the model's transfer to new data and anti-forgetting of old data. However, the visual features of a small amount of data are often difficult to model a complete feature distribution of a category, resulting in weak generalization ability. The text features of image category descriptions have better generalization and anti-forgetting abilities than visual features. Therefore, on the basis of the Visual-Language Model (VLM), we propose the textual knowledge embedding mode to embed text features with anti-forgetting ability in visual features, thus achieving effective learning of new and old category data in few-shot incremental learning. Specifically, in the basic learning stage, we use the VLM to extract the pre-trained visual features and category text descriptions. Furthermore, we use the text encoder to project the pre-trained visual features to text space. Next, we use the visual encoder to fuse the learned text features and pre-trained visual features to enhance visual features with high discrimination ability. In the incremental learning stage, we use the category space encoding of old data and new data features to fine-tune the visual encoder and text encoder and further achieve new data knowledge learning while reviewing old knowledge. We verified the effectiveness of the algorithm on four datasets (CIFAR-100, CUB-200, Stanford Cars, and miniImagenet), proving that textual knowledge embedding based on large-scale VLM can further improve the robustness of few-shot incremental learning on the basis of visual features.