How many examples does it take for fine-tuning to outperform few-shot prompting? A study of medical text classification and domain adaptation

ACL ARR 2024 December Submission739 Authors

15 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Given the recent success of large language models, a critical question for machine learning engineers is when to use few-shot prompting vs. fine-tuning. We explore this question in a medical setting, where data restrictions make only a small number of training examples realistic, and where the ability to adapt from one domain to another is critical. On two medical text classification tasks, we find that fine-tuning outperforms few-shot prompting with as little as 100 labeled examples and that few-shot prompting has a greater risk of robustness problems.
Paper Type: Short
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: data-efficient training, NLP in resource-constrained settings
Contribution Types: Approaches to low-resource settings, Approaches low compute settings-efficiency
Languages Studied: English
Submission Number: 739
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview