Statistically Indistinguishable, Operationally Distinct: A Formal Barrier for Tabular Foundation Models
Keywords: Enterprise Tabular Data, Operational Grounding, Tabular Foundation Models
Abstract: Tabular foundation models cannot reason about data produced by running systems without access to the rules that govern them. We make this falsifiable. The \emph{Operational Turing Test} (OTT) constructs pairs of legal and rule-violating database states whose $1$- and $2$-way column-value marginals match to TV $<0.02$; Le~Cam's lemma then bounds any values-only classifier at $\geq0.49$ Bayes error. Three architectural families (XGBoost, TabICL, TabPFN) hit the bound exactly (accuracy $0.501$, pre-registered TOST $p<0.002$), raw row-level access does not help, relational features close most of the gap but miss derivation entirely, and a classifier with seven rule-derived audit features reaches $0.9996$. Frontier reasoning models given the schema, trigger source, and rule tables in-prompt classify only $0$--$3$ of $50$ legal states as LEGAL across both prompt framings, two reasoning-effort levels, a $4\times$ token-budget sweep, and a SQL-executor variant. The access-ladder pattern also appears on a second schema with structurally distinct rule families (banking ledger: cross-row balance, cumulative aggregate). The barrier is identifiability, not capacity: scale, data, and richer features cannot cross it without operational grounding.
Submission Number: 21
Loading