Abstract: In medical domains, hospitals and medical research institutions produce large-scale real-world data with physician-annotated diagnoses every day. An ideal solution is to conduct fine-tuning (FT) with these data when developing large language models (LLMs) for medical domains. However, considering patients' privacy, it is still suspicious that de-identification is not performed carefully and LLMs may memorize the patient's information during FT. Instead, in-context learning (ICL) only relies on few-shot demonstrations. LLMs with ICL perform quite better than zero-shot inference, which is a possible alternative solution compared to FT, because ICL can efficiently adapt to new tasks by learning from given demonstrations. Also, medical institutions can maintain them locally and share limited de-identified data only when needed without sharing all sensitive data for FT. However, the current consensus is that there is a significant performance gap between ICL and FT. Moreover, under the multi-task scenario, FT usually suffers from unbalanced issues, whereas ICL under this setting is underexplored. In this paper, we conduct a comparison between ICL and FT under multi-task setting, exploring their performance gap. Empirical studies show that the advanced ICL method already achieves comparable performance as FT under the multi-task scenario, showing its great potential in medical domains.
Paper Type: Short
Research Area: NLP Applications
Research Area Keywords: healthcare applications, clinical NLP, analysis, data influence
Contribution Types: Model analysis & interpretability, Approaches to low-resource settings, Data analysis
Languages Studied: Japanese
Submission Number: 3013
Loading