Leveraging Large Language Models for Medical Information Extraction and Query Generation

Published: 01 Jan 2024, Last Modified: 16 May 2025WI/IAT 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This paper introduces a system which integrates large language models (LLMs) into clinical trials retrieval, improving patient-trial matching while preserving data privacy and expert oversight. We evaluate six LLMs for query generation, focusing on open-source and small models requiring minimal computational resources. Our findings show that these models achieve retrieval effectiveness comparable to or exceeding expert-created queries and consistently outperform standard baselines and literature approaches. The best-performing LLMs exhibit fast response times (1.7-8 seconds) and generate a manageable number of query terms (15–63). Our results suggest that small, open-source LLMs can effectively balance performance, computational efficiency, and real-world applicability in clinical trial retrieval.
Loading