Abstract: Large Language Models (LLMs) have been observed to perform well on a wide range of downstream tasks when fine-tuned on domain-specific data. However, such data may not be readily available in many applications, motivating zero-shot or few-shot approaches using existing domain or task adjacent (fine-tuned) models, which we call DAFT. While several fine-tuned models for various tasks are available, finding one appropriate DAFT model for a given task is often not straight forward. In this paper, we explore different utilization techniques of these existing DAFT models for data-scarce problems, i.e., tasks for which data is not available or limited. We observe that for zero-shot problems, ensembling of DAFT models provides an accuracy performance close to that of the single best model. With few-shot problems (few data from target domain available), this performance can be improved further by picking or putting more weights to the DAFT models that are expected to perform better on the target task.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: We have revised the manuscript to address the minor revision requested by the Action Editor. The specific modifications are as follows:
A new figure and accompanying text have been added to illustrate and discuss the performance of DAFT models on 9 medical-related datasets from MMLU (see Page 6, last paragraph, and Figure 4).
Figure 7 and the corresponding discussion on Page 9 present a comparison of the performance of $DAFT\text{-}E^z$ against the individual DAFT models on these 9 datasets.
The potential advantages of employing $DAFT\text{-}E$ for the MMLU datasets are analyzed on Pages 11–12, supported by the results shown in Figure 10.
Thanks,
Authors
Submission Number: 4175