Keywords: Large language models, fine-tuning, fairness, benchmarking
Abstract: We introduce a novel and extensive instruction-tuning dataset using echocardiogram reports sourced from MIMIC-IV. This dataset is specifically tailored to enhance question answering (QA) systems in the field of cardiology. It comprises 765,605 QA pairs addressing a wide array of cardiac abnormalities and their severity. To validate the utility of this benchmark dataset, we employ various large language models (LLMs), encompassing both open-source general models and biomedical-specific models, along with state-of-the-art closed-source models for zero-shot evaluation. Our results reveal that certain models achieve superior performance across all evaluated metrics. This underscores the effectiveness of instruction fine-tuning for echocardiogram data. Additionally, we conduct an audit of the best performing LLM across demographic groups and marginalized populations. Our objective is to propel the field forward by establishing a benchmark framework for developing LLM AI agents that support clinicians in their daily workflow within the cardiology space. The availability of this dataset aims to support the advancement of natural language models for use in diagnostic decision support systems, aiming to increase efficiency and decrease diagnostic errors in cardiology care. All code will be available on the Github and the data will be made available on HIPAA-compliant data repository PhysioNet.
Primary Area: datasets and benchmarks
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 13544
Loading