Keywords: Medical cohort analysis, Risk assessment, generalization, prompt engineering, open source model
TL;DR: In this paper, We introduce a novel method leveraging LLMs with automatically generated, knowledge-rich prompts to improve medical cohort analysis by enabling comprehensive risk assessment across a significantly larger number of factors.
Abstract: Driven by the goal of data-driven analysis on the large-scale cohort, a large language model(LLM) has solidified itself as a critical focus of artificial intelligence medical research today. However, such efforts have coalesced around a small group of evidence, leaving behind the vast majority of factors collected in the cohort investigation. What does it take to break the more than 70 factors while ensuring responsible, high-quality prediction, all while keeping medical considerations in mind? In No Factor Left Behind, we first took on this challenge by numerical interpretable evidence contextualizing the need for Premature rupture of membranes (PROM) risk assessment through exploratory interviews with domain experts. Then, we created datasets and models aimed at narrowing the performance gap between low and high-frequency factors. More specifically, we developed a model based on factor-value pairs trained on data obtained with robust and effective data mining techniques tailored for low-frequency factors. We propose multiple architectural and training improvements to counteract overfitting while training on 70 factors. Critically, we interpreted the risk of PROM over 7000 cohort participants' directions using numerical interpretable evidence with precise values of factors combined with human evaluation covering all factors in the dataset to assess medical safety. Our model achieves a performance of 79\% accuracy (78 factors) and 96\% accuracy(40 factors) with risk assessment at the screening level, laying the novel insight for realizing a general medical cohort analysis method in the era of LLMs.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6277
Loading