Learning Structured Dependencies using Generative Computational Units

ICLR 2026 Conference Submission20015 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Computational Units, Supervised Learning, Generalization
TL;DR: We show that a different optimal computational unit emerges when assuming a switching-based structural causal model for data generation.
Abstract: The ability of neural networks to generalize from data is fundamentally shaped by the design of their computational units. Common examples include the perceptron and radial basis function (RBF) units, each of which provides useful inductive biases. In this work, we introduce a new computational unit, the generative matching unit (GMU), which is designed to naturally capture structured dependencies in data. Each GMU contains an internal generative model that infers latent parameters specific to an input instance $X$, and then outputs a non-linear function of the generative error using these parameters. By incorporating generative mechanisms into the unit itself, GMUs offer a complementary approach to existing computational units. In this work, we focus on linear GMUs, where the internal generative models are linear latent variable models, yielding a function form that shares some similarities with RBF units. We show that linear GMUs are universal approximators like RBFs, while being able to convey richer information and lessen the impact of the curse of dimensionality compared to RBFs. Like perceptrons and RBF units, a linear GMU has its set of weights and biases, and has a closed-form analytical expression, enabling fast computation. To evaluate the performance of linear GMUs, we conduct a set of comprehensive experiments and compare them to multi-layer perceptrons (MLPs), RBF networks, and ResNets. We construct GMU-ResNets, where the first feedforward layer is replaced by GMUs, and test on 27 tabular datasets, observing improved generalization over standard ResNets and competitive performance with other benchmarks. We also construct GMU-CNNs, which contain convolutional GMUs in their first layer. Across five vision datasets, GMU-CNNs exhibit better generalization and significantly better robustness to test-time corruptions. We also empirically compare linear GMUs, to benchmark networks across more than 30 synthetic classification tasks, encompassing both structured and unstructured data distributions. We find that GMUs consistently demonstrate superior generalization to out-of-distribution samples, especially for the structured cases.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 20015
Loading