Enhancing weakly supervised semantic segmentation for fruit defects detection via feature cross-attention fusion and multi-scale similarity quantification

Published: 2025, Last Modified: 21 Jan 2026Pattern Anal. Appl. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Precise detection of surface defects is critical for enhancing the commercial value of fruit. Current automated sorting systems struggle with multi-defect differentiation and precise boundary delineation under weak supervision. We propose IFCNet, a novel weakly supervised semantic segmentation framework, to address these challenges. The framework enhances the quality of class activation maps (CAM) to generate high-quality pseudo-masks via three innovations: (1) an inter-pixel feature similarity (IPFS) module enhances feature representations through multi-scale feature fusion and inter-pixel similarity quantification; (2) a confident areas selection by background mask (CASBM) module improves supervision signals for CAM refinement algorithm; (3) a feature fusion by cross-attention (FFCA) module adaptively balances discriminability and completeness of representations. IFCNet generates high-quality pseudo masks with a mean Intersection over Union (mIoU) of 71.688% on the test set of navel oranges. When the generated pseudo masks are used to supervise lightweight segmentation models FastSegFormer-P and FastSegFormer-E, the segmentation results achieve mIoUs of 78.504% and 79.412%, respectively. In addition, we verify the generalization potential of our method through zero-shot tests on cross-fruit defect datasets. Our method avoids high pixel-level annotation costs, providing an economical solution to automate fine grading of fruit quality.
Loading