Background and Visual Feature-Aware Data Augmentation for FGIR via Image Generation

Takuya Kato, Shion Serizawa, Mitsuki Okayama, Yuta Nakano, Tatsuhito Hasegawa

Published: 2024, Last Modified: 10 Jun 2025GCCE 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Fine-Grained Image Recognition (FGIR) involves distinguishing subtle differences within the same category, a challenging task due to high inter-class similarity and intra-class variability. Enhancing accuracy typically requires large, well-labeled datasets, which are difficult to obtain for FGIR. We propose a method to augment datasets using an image generative AI model. We investigated input text prompts indicating target class names with diverse backgrounds and used a multimodal model to incorporate the target class’s visual features. Our method also employed an image processing pipeline for background replacement. Our experiments show that while Text-to-Image generation struggles with detailed feature representation, it improves accuracy in one-shot learning scenarios. Additionally, using image generative AI models for background replacement can outperform baseline methods under certain conditions, highlighting the effectiveness of our method.