A Novel Multi-view Perception and Shrinkage Aggregation Network for Inharmonious Region Localization

Shenghao Chen, Chunjie Ma, Yibo Zhao, Meng Liu, Yanbing Xue, Zan Gao

Published: 01 Jan 2025, Last Modified: 16 Jan 2026IEEE Transactions on Circuits and Systems for Video TechnologyEveryoneRevisionsCC BY-SA 4.0
Abstract: With the popularity of image editing techniques, synthetic images may have inharmonious regions due to color/illumination differences between the manipulated area and the background. The inharmonious region localization task aims to find these regions, which is crucial for blind image harmonization. Existing methods rely on single-view images and do not fully explore multi-scale fusion, which limits their performance. To address these issues, in this paper, we propose a novel multi-view perception and shrinkage aggregation network (MSANet) for the inharmonious region localization task that fully utilizes multi-view images and multi-scale fusion information and can mine subtle cues between candidate objects and the background. Specifically, we first design a multi-view ensemble encoder to fully perceive the inharmonious regions by multi-view interactive learning and then aggregate the feature representations of inharmonious regions. Moreover, we propose a multi-scale shrinkage fusion decoder, where multi-scale features with multi-view prior information are utilized to aggregate adjacent features, adaptively select high-quality information, reduce background interference and gradually locate inharmonious regions. Extensive experimental results on four public datasets (HDobe5K, HCOCO, HFlickr, and Hday2Night) demonstrate that the proposed MSANet can outperform all the SOTA methods in terms of average F1 and average IoU score, while maintaining a lower computational cost1.
Loading