Abstract: The large pretrained vision-language models (VLMs) have demonstrated remarkable data efficiency when transferred to the medical domain. However, the successful transfer hinges on the development of effective prompting strategies. Despite progress in this area, the application of VLMs to dentistry, a field characterized by complex, multi-level dental abnormalities and subtle features associated with minor dental issues, remains uncharted territory. To address this, we propose a novel approach for detecting dental abnormalities by prompting VLMs, leveraging the symmetrical structure of the oral cavity and guided by the dental notation system. Our framework consists of two main components: dental notation-aware tooth identification and multi-level dental abnormality detection. Initially, we prompt VLMs with tooth notations for enumerating each tooth to aid subsequent detection. We then initiate a multi-level detection of dental abnormalities with quadrant and tooth codes, prompting global abnormalities across the entire image and local abnormalities on the matched teeth. Our method harmonizes subtle features with global information for local-level abnormality detection. Extensive experiments on the re-annotated DETNEX dataset demonstrate that our proposed framework significantly improves performance by at least 4.3% mAP and 10.8% AP50 compared to state-of-the-art methods. Code and annotations will be released on https://github.com/CDchenlin/DentalVLM.
Loading