Step-forward structuring disease phenotypic entities with LLMs for disease understanding

Alvaro Garcia-Barragán, Alberto González Calatayud, Lucía Prieto Santamaría, Víctor Robles, Ernestina Menasalvas, Alejandro Rodríguez

Published: 2024, Last Modified: 10 Sept 2024CBMS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In the rapidly evolving field of biomedical text mining, the extraction of phenotypic entities from unstructured texts remains a pivotal challenge. This paper introduces a novel method that leverage Large Language Models (LLMs) to extract phenotypical entities from freely available texts such as Wikipedia. Our approach goes beyond traditional Named Entity Recognition (NER) techniques by utilizing both local and cloud-based LLMs. We present a comprehensive comparison with state-of-the-art tools. Our study confirms the significant advantages of LLMs in identifying relevant phenotypic entities, thus enhancing the ability of researchers and clinicians to understand and respond to disease dynamics more effectively. Therefore, this work underscores the potential of next-generation LLMs to redefine the standards for the extraction of phenotypic entities in biomedical research.