Keywords: supervised learning, classification, robustness
TL;DR: We propose a new computational unit for feedforward supervised learning architectures, called generative matching units.
Abstract: We propose an alternative computational unit for feedforward supervised learning architectures, called Generative Matching Units (GMUs). To understand GMUs, we start with the standard perceptron unit and view it as an undirected symmetric measure of computation between the weights $W=[w_1,w_2,..w_d]$ and each input datapoint $X=[x_1,x_2,..,x_d]$. Perceptrons forward $W^TX+b$, which is usually followed by an activation function. In contrast, GMUs compute a directed asymmetric measure of computation that estimates the degree of functional dependency $f$ of the input elements $x_i$ of each datapoint to the weights $w_i$ in terms of latent generative variables $\theta$, i.e, $f(w_i,\theta) \rightarrow x_i$. In order to estimate the functional dependency, GMUs measure the minimum error $\sum (f(w_i,\theta)-x_i)^2$ incurred in the generation process by optimizing $\theta$ for each input datapoint. Subsequently, GMUs map the error into a functional dependency measure via an appropriate scalar function, and forward it to the next layer for further computation. In GMUs, the weights $[w_1,w_2,..,w_d]$ can therefore be interpreted as the $\textit{generative weights}$. We first compare the generalization ability of GMUs and multi-layered-perceptrons (MLPs) via comprehensive synthetic experiments across a range of diverse settings. The most notable finding is that when the input is a sparse linear combination of latent generating variables, GMUs generalize significantly better than MLPs. Subsequently, we evaluate Resnet MLP networks where the first feedforward layer is replaced by GMUs (GMU-MLP) on 30 tabular datasets and find that in most cases, GMU-MLPs generalize better than the MLP baselines. We also compare GMU-MLP to a set of other benchmarks, including TabNet, XGBoost, etc. Lastly, we evaluate GMU-CNNs on three standard vision datasets and find that in all cases they generalize better than the corresponding CNN baselines. We also find that GMU-CNNs are significantly more robust to test-time corruptions.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9193
Loading