Release Opt Out: No, I don't wish to opt out of paper release. My paper should be released.
Keywords: data attribution
Abstract: Choice of training data distribution greatly affects model behavior. Yet, in
large-scale settings, precisely characterizing *how* changes in training
data influence predictions is often difficult due to model training costs.
Current practice is to instead extrapolate from scaled down,
inexpensive-to-train proxy models. However, changes in data do not influence
smaller and larger models identically. Therefore, understanding how choice of
data affects large-scale models raises the question: how does training data
influence model behavior across compute scale? We find that the answer is
nuanced. Small- and large-scale language model predictions generally *do*
highly correlate across choice of training data---often, even when small-model
predictions are the level of random guessing. However, there *also* exist
training datasets for these predictions correlate much less. Equipped with these
findings, we characterize how proxy scale affects performance in two downstream
proxy model applications: data attribution and dataset selection.
Submission Number: 17
Loading