DMITIC: Dentition Defect Diagnosis via Multimodal Instruction-Tuning for CBCT Image Captioning

DMITIC: Dentition Defect Diagnosis via Multimodal Instruction-Tuning for CBCT Image Captioning

ACL ARR 2025 February Submission279 Authors

05 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: The rapid development of LLMs has brought powerful text generation capabilities, leading to significant improvements in image captioning tasks. Addressing the challenges in medical domains, such as limited data availability, complex recognition requirements, and difficult manual annotation, we innovatively extend image captioning to CBCT-based dentition defect diagnosis tasks. Unlike traditional approaches that use semantic segmentation or object detection methods to locate missing teeth, our method only requires standard CBCT images (both with or without missing teeth) as input. Through image-text combined instruction-tuning with our model that integrates CLIP and SAM into BLIP2, we can successfully extract missing tooth location information from CBCT images and provide assessments in textual form. This greatly enhances the ability to reveal clinical information and provides valuable diagnostic assistance to doctors. In terms of performance, our method outperforms both MSMedCap, which is specifically designed for medical imaging, and InstructBLIP, which is trained on general datasets. We have achieved state-of-the-art results in our pioneering approach of using image captioning for dentition defect diagnosis. The key raw data has been uploaded to Research Data Deposit (www.researchdata.org.cn), validating the authenticity of this paper with the RDD number:

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: generative models,few-shot learning,healthcare applications,clinical NLP,biomedical QA,cross-modal information extraction

Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Publicly available software and/or pre-trained models, Data resources

Languages Studied: English,Chinese

Submission Number: 279

Loading