What do LLMs value? An evaluation framework for revealing subjective trade-offs in assessment of glycemic control

Published: 27 Nov 2025, Last Modified: 28 Nov 2025ML4H 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language models, LLM, diabetes, glucose monitoring, clinical decision making, evaluation
TL;DR: This work presents a framework to examine the values embedded in commercial large language models for assessing quality of glycemic control in the setting of diabetes.
Track: Proceedings
Abstract: Clinical decisions often require balancing conflicting priorities rather than simply selecting a single “correct” answer. We present an evaluation framework that probes the value judgments embedded in large language models (LLMs) by testing how they assess quality of glycemic control from continuous glucose monitoring (CGM) data. Using synthetic type 1 diabetes profiles, we asked five commercial LLMs to perform pairwise comparisons of CGM summary statistics and derived a percentile ranking for each profile. We then quantified alignment with two reference metrics: time in range (TIR) and the expert-derived Glycemia Risk Index (GRI), which was developed with clinician input regarding preferences across glycemic ranges. Across three insulin therapy modalities, newer models showed stronger correlation with GRI than older models, suggesting a generational shift toward expert consensus. However, a perturbation analysis revealed instances of disagreement around the weighting of mild hypoglycemia and mild hyperglycemia relative to the GRI. These results demonstrate that high average agreement with clinical metrics can mask clinically meaningful misalignments in how LLMs prioritize risks. Our proposed framework reveals how LLM outputs reflect competing priorities in clinical contexts.
General Area: Applications and Practice
Specific Subject Areas: Evaluation Methods & Validity, Natural Language Processing
PDF: pdf
Data And Code Availability: Yes
Ethics Board Approval: No
Entered Conflicts: I confirm the above
Anonymity: I confirm the above
Submission Number: 131
Loading