Rethinking Description Length: A TabPFN-Based Approximation of Bayesian Mixture Codes

Afiq Abdillah Effiezal Aswadi; Susan Wei; Ria Jeffrey

Rethinking Description Length: A TabPFN-Based Approximation of Bayesian Mixture Codes

Afiq Abdillah Effiezal Aswadi, Susan Wei, Ria Jeffrey

Published: 09 Jun 2025, Last Modified: 09 Jun 2025FMSD @ ICML 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Minimum Description Length, TabPFN, In-context learning, Prior-Data Fitted Networks (PFNs), Bayesian Mixture Code

TL;DR: Prompting TabPFN to approximate the Bayes mixture code via in-context learning induces a prequential coding scheme whose codelengths on 24 OpenML-CC18 tabular classification tasks are consistently shorter than those of prequential SGD-trained MLPs .

Abstract: The Minimum Description Length principle, a model selection framework based on Occam’s razor, is typically studied through universal codes, and the associated codelengths. Blier & Ollivier (2018) studied the MDL principle for deep neural networks. They compare various codes, and find that a prequential code, based on neural networks trained through stochastic gradient descent, has shorter codelength than variational and two-part codes. Recent developments in deep learning point to a better way to approximate the Bayes mixture code than the variational code. Specifically Hollmann et al. (2023) present a transformer architecture, called Tabular Prior-Data Fitted Networks (TabPFNs), which are trained on synthetic data generated from a vast array of prior-likelihood pairs, and is encouraged to learn the corresponding Bayes posterior predictive distribution. We then use TabPFN to induce a code through in-context learning and demonstrate on real world datasets from the OpenML-CC18 suite that the resulting code is consistently shorter than the prequential code corresponding to MLPs.

Submission Number: 16

Loading