SoftTiger: A Clinical Foundation Model for Healthcare Workflows

Published: 29 Feb 2024, Last Modified: 02 May 2024AAAI 2024 SSS on Clinical FMsEveryoneRevisionsBibTeXCC BY 4.0
Track: Traditional track
Keywords: Large Language Model, Clinical Large Language Models, Clinical notes, International patient summary
TL;DR: A clinical foundation model for healthcare workflows
Abstract: We introduce SoftTiger, a clinical large language model (CLaM) designed as a foundation model for healthcare workflows. The narrative and unstructured nature of clinical notes is a major obstacle for healthcare intelligentization. We address a critical problem of structuring clinical notes into clinical data, according to international interoperability standards. We collect and annotate data for three subtasks, namely, international patient summary, clinical impression and medical encounter. We then supervised fine-tuned a state-of-the-art LLM using public and credentialed clinical data. The training is orchestrated in a way that the target model can first support basic clinical tasks such as abbreviation expansion and temporal information extraction, and then learn to perform more complex downstream clinical tasks. Moreover, we address several modeling challenges in the healthcare context, e.g., extra long context window. Our blind pairwise evaluation shows that SoftTiger outperforms other popular open-source models and GPT-3.5, comparable to Gemini-pro, with a mild gap from GPT-4. We believe that LLMs may become a step-stone towards healthcare digitalization and democratization. Therefore, we publicly release SoftTiger models at scales of 13 billion and 70 billion parameters, as well as datasets and code for our innovative scalable evaluation, hopefully, making a significant contribution to the healthcare industry.
Presentation And Attendance Policy: I have read and agree with the symposium's policy on behalf of myself and my co-authors.
Ethics Board Approval: Yes, we have/will include(d) information about IRB approval or its equivalent, in the manuscript.
Data And Code Availability: Yes, we will make data and code available upon acceptance.
Primary Area: Clinical foundation models
Student First Author: No, the primary author of the manuscript is NOT a student.
Submission Number: 15
Loading