Bayes-Optimal Coexistence via Fact Localizability in Trainable-Feature Decoder-Only Transformers

Published: 04 Jun 2026, Last Modified: 04 Jun 2026ICML MemFM 2026 Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: exact memorization, fact localizability, Bayes-optimal generalization, trustworthy foundation models, decoder-only transformers, trainable-feature learning, rules-and-facts model, sparse fact interpolation, neural tangent kernel, lazy training, structural deletability, machine unlearning
Abstract: We give a representation-theoretic account of when exact memorization of sparse facts can coexist with Bayes-optimal rule generalization in trainable-feature decoder-only transformers. For a causal rules-and-facts model, we define the Bayes-coexistence gap $\Delta_{F_m}(\mathcal T)$ and the fact-localizability functional $\Lambda_{F_m}(\mathcal T;P_{\mathrm{rule}})$, and prove the exact squared-loss identity $\Delta_{F_m}=\Lambda_{F_m}$. A minimal trainable-feature decoder then admits a rule--residual factorization $A_{\bar\Theta}(X)=(S^\star(X),Z^\perp(X),0)$, where $S^\star$ carries the Bayes rule and $Z^\perp$ is an independent Gaussian residual block; sparse ReLU tents in this block interpolate arbitrary bounded facts with excess rule risk at most $|\delta(F_m)|\infty^2[m e^{-25d\perp/128}+m(m-1)e^{-75d_\perp/224}]$. Conversely, the affine lazy/tangent class of the same decoder has a nonvanishing coexistence gap unless its tangent kernel has sufficiently large effective dimension. The construction also yields exact structural deletability: selected memorized residual facts are removed by zeroing their residual-only MLP output coefficients, while the Bayes rule is unchanged.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 9
Loading