Keywords: Minimum Description Length, TabPFN, In-context learning, Prior-Data Fitted Networks (PFNs), Bayesian Mixture Code
TL;DR: Prompting TabPFN to approximate the Bayes mixture code via in-context learning induces a prequential coding scheme whose codelengths on 24 OpenML-CC18 tabular classification tasks are consistently shorter than those of prequential SGD-trained MLPs .
Abstract: The Minimum Description Length principle, a model selection framework based on Occam’s razor, is typically studied through universal codes, and the associated codelengths. Blier & Ollivier (2018) studied the MDL principle for deep neural networks. They compare various codes, and find that a prequential code, based on neural networks trained through stochastic gradient descent, has shorter codelength than variational and two-part codes. Recent developments in deep learning point to a better way to approximate the Bayes
mixture code than the variational code. Specifically Hollmann et al. (2023) present a transformer architecture, called Tabular Prior-Data Fitted Networks (TabPFNs), which are trained on synthetic data generated from a vast array of prior-likelihood pairs, and is encouraged to learn the corresponding Bayes posterior predictive distribution. We then use TabPFN to induce a code through in-context learning and demonstrate on real world datasets from the OpenML-CC18 suite that the resulting code is consistently shorter than the prequential code corresponding to MLPs.
Submission Number: 16
Loading