Keywords: model-adaptation, in-contex-learning, sample-efficiency
Abstract: When adapting large language models (LLMs) to a specific downstream task,
two primary approaches are commonly employed: (1) prompt engineering with
in-context few-shot learning, leveraging the model’s inherent generalization abil-
ities, and (2) fine-tuning on task-specific data, directly optimizing the model’s
parameters. While prompt-based methods excel in few-shot scenarios, their effec-
tiveness often plateaus as more data becomes available. Conversely, fine-tuning
scales well with data but may underperform when training examples are scarce.
We investigate a unified approach that bridges these two paradigms by incorpo-
rating in-context learning directly into the fine-tuning process. Specifically, we
fine-tune the model on task-specific data augmented with in-context examples,
mimicking the structure of k-shot prompts. This approach, while requiring per-
task fine-tuning, combines the sample efficiency of in-context learning with the
performance gains of fine-tuning, leading to a method that consistently matches
and often significantly exceeds both these baselines. With an emphasis on practi-
cality, we introduce a hyperparameter optimization strategy based on prequential
evaluation, which is effective in data-limited scenarios and eliminates the need for
expensive cross-validation. We conduct an extensive empirical study to investi-
gate the sample efficiency of fine-tuning, in-context learning, and the proposed
unified approach across a diverse range of downstream tasks.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 24522
Loading