Generative imputation of incomplete images: Leveraging multimodal information for missing pixel

Published: 2025, Last Modified: 07 Jan 2026Inf. Sci. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Missing pixels are a common issue in real-world images, arising from various factors such as hardware malfunctions, sensor errors, and other unforeseen circumstances. This prevalence of missing pixels has made incomplete image imputation a critical area of research, garnering attention both domestically and internationally. However, as the volume of data continues to grow, traditional imputation methods that rely exclusively on information from the target images are becoming less effective, particularly in scenarios where the proportion of missing pixels is high. To address this challenge, we propose a novel imputation model named MMIGAN (Multi-modal Imputation Generative Adversarial Network), which imputes incomplete images by leveraging not only the information from the images themselves but also additional information from corresponding texts. Specifically, MMIGAN is a GAN-based model where the generator G comprises a cross-modality feature learning subnet to extract multimodal features and an MV imputation subnet to output the imputed images. Meanwhile, the discriminator D attempts to distinguish between real (observed) and fake (imputed) pixels to enhance imputation accuracy. We conducted extensive experiments on the Flickr8k, Flickr30k, and COCO datasets, demonstrating that MMIGAN surpasses state-of-the-art methods in image inpainting tasks. Under varying missing rates, the peak performance improvements across these datasets reached 52.5%, 61.0%, and 54.6% respectively, while maintaining robust minimum improvements of 38.2%, 39.5%, and 35.2%. These results provide conclusive evidence for both the superiority of MMIGAN and the effectiveness of multimodal information fusion in addressing image inpainting challenges. The code is available at https://github.com/guoynow/MMIGAN.git.
Loading