Predicting Failures of LLMs to Link Biomedical Ontology Terms to Identifiers: Evidence Across Models and Ontologies

Published: 19 Aug 2025, Last Modified: 12 Oct 2025BHI 2025EveryoneRevisionsBibTeXCC BY 4.0
Confirmation: I have read and agree with the IEEE BHI 2025 conference submission's policy on behalf of myself and my co-authors.
Keywords: ontology, normalization, large language models, Gene Ontology, Human Phenotype Ontology
TL;DR: Systematic evaluation of why LLMs fail to link ontology terms to their correct identifiers by analyzing predictions across two major biomedical ontologies: HPO and the GO-CC.
Abstract: Large language models (LLMs) often perform well on biomedical NLP tasks but may fail to link ontology terms to their correct identifiers (IDs). We investigate why these failures occur by analyzing predictions across two major ontologies—Human Phenotype Ontology (HPO) and Gene Ontology–Cellular Component (GO-CC)—and two high-performing models, GPT-4o and LLaMa 3.1 405B. We evaluate nine candidate features related to term familiarity, identifier usage, morphology, and ontology structure. Univariate and multivariate analyses show that exposure to ontology identifiers is the strongest predictor of linking success. In contrast, features like term length or ontology depth contribute little. Two unexpected findings emerged: (1) large “ontology deserts” of unused terms predict near-certain failure, and (2) the presence of leading zeroes in identifiers strongly predicts success in HPO. These results show that LLM linking errors are systematic and driven by limited exposure rather than random variability. Encouraging consistent reporting of ontology terms paired with their identifiers in biomedical literature would reduce linking errors, improve normalization performance across ontologies such as HPO and GO, enhance annotation quality, and provide more reliable inputs for downstream classification and clinical decision-support systems.
Track: 2. Bioinformatics
Registration Id: Y6NRL2VQ82J
Submission Number: 383
Loading