TNM Tumor Classification from Unstructured Breast Cancer Pathology Reports using LoRA Finetuning of Mistral 7B

Published: 29 Feb 2024, Last Modified: 02 May 2024AAAI 2024 SSS on Clinical FMsEveryoneRevisionsBibTeXCC BY 4.0
Track: Traditional track
Keywords: clinical foundation models, large language models, mistral, tumor classification, low rank adaptation, fine-tuning
TL;DR: This paper focuses on using Low-Rank Adaptation (LoRA) fine-tuning of smaller foundational language models to perform TNM staging accurately and efficiently on unstructured pathology reports for triple negative breast cancer cases.
Abstract: Over the past year, large language models have seen an explosion in usage, with researchers and companies rushing to discover new applications. This explosion was kick-started by OpenAI, with their release of GPT 3.5 and GPT 4 to the general public. These foundation models have proven extraordinarily capable on a wide range of tasks, but their cost and reliability present problems for more sensitive and/or resource-limited applications. Over the same time-span, however, we have also seen a rush of development in smaller foundation models, such as Mistral's 7B model, as well as in fine-tuning those models for specific tasks. In this paper, we explore the application of Low-Rank Adaptation (LoRA) fine-tuning of small language models for performing TNM staging on unstructured pathology reports for triple negative breast cancer cases. We also attempt to develop a more generalized approach, so that our work can be applied to other NLP tasks within the medical field. We found that performing TNM staging with reliable accuracy is possible for a small foundational model through fine-tuning, allowing fast and reliable automation of critical language processing tasks within medicine.
Presentation And Attendance Policy: I have read and agree with the symposium's policy on behalf of myself and my co-authors.
Ethics Board Approval: No, our research does not involve datasets that need IRB approval or its equivalent.
Data And Code Availability: Yes, we will make data and code available upon acceptance.
Primary Area: Clinical foundation models
Student First Author: Yes, the primary author of the manuscript is a student.
Submission Number: 20
Loading