How many examples does it take for fine-tuning to outperform few-shot prompting? A study of medical text classification and domain adaptation

How many examples does it take for fine-tuning to outperform few-shot prompting? A study of medical text classification and domain adaptation

ACL ARR 2024 December Submission739 Authors

15 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Given the recent success of large language models, a critical question for machine learning engineers is when to use few-shot prompting vs. fine-tuning. We explore this question in a medical setting, where data restrictions make only a small number of training examples realistic, and where the ability to adapt from one domain to another is critical. On two medical text classification tasks, we find that fine-tuning outperforms few-shot prompting with as little as 100 labeled examples and that few-shot prompting has a greater risk of robustness problems.

Paper Type: Short

Research Area: Efficient/Low-Resource Methods for NLP

Research Area Keywords: data-efficient training, NLP in resource-constrained settings

Contribution Types: Approaches to low-resource settings, Approaches low compute settings-efficiency

Languages Studied: English

Submission Number: 739

Loading