Multi-modal data augmentation based on masked modeling for image-text retrieval

Published: 01 Jan 2025, Last Modified: 05 Jul 2025Knowl. Based Syst. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•We proposed a masked modeling-based multi-modal data augmentation method for image–text retrieval tasks.•We proposed a novel metric that considers both consistency and diversity to measure the quality of the augmented samples for filtering.•Experiments on several datasets verified that our method outperforms other multi-modal and uni-modal augmentation methods.
Loading