Abstract: Missed polyps are the major risk factor for colorectal cancer. To minimize misdiagnosis, many methods have been developed. However,
they either rely on laborious instance-level annotations, require labeling of prompt points, or lack the ability to filter noise proposals
and detect polyps integrally, resulting in severe challenges in this area. In this paper, we propose a novel Cooperation-Based network
(CBNet), a two-stage polyp detection framework supervised by image labels that removes wrong proposals through classification
in collaboration with segmentation and obtains a more accurate detector by aggregating adaptive multi-level regional features. Specifically, we conduct a Cooperation-Based Region Proposal Network (CBRPN) to reduce the negative impact of noises by deleting proposals without polyps, enabling our network to capture polyp features. Moreover, to enhance location integrity and classification precision of polyps, we aggregate multi-level region of interest (ROI) features under the guidance of the backbone classification layer, namely Adaptive ROI Fusion Module (ARFM). Extensive experiments on the public and private datasets show that our method achieves stateof-the-art performance for weakly supervised methods and even outperforms full supervision in some terms. All code is available at https://github.com/dxqllp/CBNet.
Primary Subject Area: [Experience] Multimedia Applications
Relevance To Conference: Our method belongs to the application of multimedia in the field of medical image. The contributions for the multimedia processing are shown as follows: 1) we successfully apply weak supervision to the domain of polyp detection, reducing the burden of data annotation and expanding the application of weak supervision in multimedia. 2) We designed the collaborative mechanism can provide high-quality proposals to improve detection performance, which offers a new study direction for generating high-quality region proposals and solving local over-fitting. 3) Our work can detect small and flat objects, thus it has the potential to be applied to other multimedia interpretation tasks, e.g., small target recognition, defect detection.
Supplementary Material: zip
Submission Number: 2117
Loading