Keywords: learning curve extrapolation, prior-data fitted networks, transformers, Bayesian inference, uncertainty estimation
TL;DR: We show that Prior-data Fitted Networks (PFNs) compare favorably against MCMC for learning curve inference.
Abstract: Learning curve extrapolation aims to predict model performance in later epochs of a machine learning training, based on the performance in the first k epochs. In this work, we argue that, while the varying difficulty of extrapolating learning curves warrants a Bayesian approach, existing methods are (i) overly restrictive, and/or (ii) computationally expensive. We describe the first application of prior-data fitted neural networks (PFNs) in this context. PFNs use a transformer, pre-trained on data generated from a prior, to perform approximate Bayesian inference in a single forward pass. We present preliminary results, demonstrating that PFNs can more accurately approximate the posterior predictive distribution multiple orders of magnitude faster than MCMC, as well as obtain a lower average error predicting final accuracy obtained by real learning curve data from LCBench.