ColonCLIP: An Adaptable Prompt-Driven Multi-Modal Strategy for Colonoscopy Image Diagnosis

Published: 01 Jan 2024, Last Modified: 22 Oct 2024ISBI 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Colonoscopies serve as a pivotal instrument for colorectal cancer detection and diagnosis. Images in colonoscopy reports typically contain valuable information from the procedure, but manual annotation of these images is laborious and time-consuming due to the high volume of daily procedures. The situation is further complicated when a few images with new image categories reflecting the complex nature of colorectal conditions and the flexibility of annotations occur. Conventional uni-modal lesion classification models fall short in accommodating these new categories, while multi-modal finetuning methods often compromise the integrity of original category knowledge when integrating new ones, leading to subpar mixed-category test performance. In response to these challenges, we present the first comprehensive, open-access colonoscopy image diagnosis dataset OpenColonDB and introduce ColonCLIP. Through the integration of prompts during both training and testing, ColonCLIP effectively learns the features of basic categories and retains knowledge of these features even when a small amount of data with new categories is introduced. This ensures the accuracy of mixed category predictions after adaptation. Our experiments validate the efficacy of our approach, and the GitHub link is https://github.com/Zoe-TAN/ColonCLIP-OpenColonDB.
Loading