Diabetica: Developing Specialized Large Language Models for Diabetes Care with Multi-Faceted Benchmarks and Clinical Validation

Diabetica: Developing Specialized Large Language Models for Diabetes Care with Multi-Faceted Benchmarks and Clinical Validation

ACL ARR 2026 January Submission1992 Authors

01 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models for Diabetes Care, Benchmarks Construction, Clinical Assessments

Abstract: Diabetes is a chronic disease with a significant global health burden, where the effectiveness of Large Language Models (LLMs) across diverse diabetes tasks remains unproven. To address this, we introduce a comprehensive framework for developing and evaluating diabetes-specialized LLMs. Our methodology begins with systematically collecting unstructured diabetes-related data from sources like clinical guidelines and medical textbooks. This data then undergoes rigorous processing, including filtering, transformation, and refinement via a novel self-distillation approach, to produce a high-quality training dataset and a multi-faceted benchmarking suite. Leveraging this training dataset, the model we obtain, Diabetica, significantly outperforms existing open-source LLMs of comparable size across diabetes-specific benchmarks, including multiple choice questions, fill-in-the-blank assessments, and open-ended dialogue scenarios. To demonstrate real-world applicability, we conduct extensive clinical assessments in three key use cases: medical counseling, medical education, and record summarization. Results reveal that Diabetica provides more thorough and empathetic patient responses than human physicians, achieves expert-level performance in medical examinations, and significantly improves clinical documentation efficiency while maintaining high quality.

Paper Type: Long

Research Area: Clinical and Biomedical Applications

Research Area Keywords: Large Language Models, Medical AI, Medical Benchmarks

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings, Publicly available software and/or pre-trained models, Data resources, Data analysis

Languages Studied: English, Chinese

Submission Number: 1992

Loading