VT-ReID: Learning Discriminative Visual-Text Representation for Polyp Re-Identification

Suncheng Xiang, Cang Liu, Jiacheng Ruan, Shilun Cai, Sijia Du, Dahong Qian

Published: 01 Jan 2024, Last Modified: 28 Sept 2024ICASSP 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Colonoscopic Polyp Re-Identification (ReID) aims to match a specific polyp in a large gallery with different cameras and views, which plays a key role in the prevention and treatment of colorectal cancer in the computer-aided diagnosis. However, traditional methods mainly focus on the visual representation learning, while neglecting to explore the potential of semantic features during training, which may easily lead to poor generalization capability when adapting the pre-trained model to the new scenarios. To relieve this dilemma, we propose a simple but effective training method named VT-ReID, which can remarkably enrich the representation of polyp videos with the interchange of high-level semantic information. Moreover, a dynamic mechanism named DCM is introduced to leverage contrastive learning to promote better separation between different categories. Empirical results show that our method significantly outperforms current state-of-the art methods with a clear margin.