Leveraging Large Language Models for Information Extraction:...

Leveraging Large Language Models for Information Extraction: Identifying microRNA - Gene Interactions in Biomedical Literature

Steve Stavropoulos, Elissavet Zacharopoulou, Spiros V. Georgakopoulos, Sotiris K. Tasoulis, Vassilis P. Plagianakos, Artemis G. Hatzigeorgiou

Published: 2024, Last Modified: 06 Feb 2025CIBCB 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The rapid growth of biomedical literature necessitates efficient Information Extraction systems able to identify relevant knowledge for various biological applications, such as understanding gene regulation by microRNA (miRNA). In this study, we employed a Large Language Model, specifically GPT-3.5 (version 0301), in conjunction with BERN2 for miRNA-gene interaction extraction from paper titles and abstracts. We optimized our approach using an initial dataset of about a thousand molecular biology papers and subsequently evaluated its performance on a manually curated dataset of 400 papers, achieving an accuracy of 82-85%. Driven by the promising results and the practical utility of our method, we applied the system to a large dataset of 39,000 papers. The extracted miRNA-gene interactions, combined with a Natural Language Processing approach, were included in the TarBase v9 database. Our findings demonstrate the potential of Large Language Models in biomedical Information Extraction tasks and highlight the limitations of the current gene and miRNA recognition systems, which hinder further improvements in accuracy.