Deep Networks Learn Features From Local Discontinuities in the Label Function

Prithaj Banerjee; Harish Guruprasad Ramaswamy; Mahesh Lorik Yadav; Chandra Shekar Lakshminarayanan

Deep Networks Learn Features From Local Discontinuities in the Label Function

Prithaj Banerjee, Harish Guruprasad Ramaswamy, Mahesh Lorik Yadav, Chandra Shekar Lakshminarayanan

Published: 22 Jan 2025, Last Modified: 22 Feb 2025ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Deep Learning, Feature learning, Interpretable, Local Discontinuities, Deep learning theory, Deep neural architectures, Supervised learning

TL;DR: Deep neural networks outperform kernel machines by learning features through discontinuities in label functions during gradient descent training, showing better performance and offering greater interpretability compared to ReLU networks.

Abstract: Deep neural networks outperform kernel machines on several datasets due to feature learning that happens during gradient descent training. In this paper, we analyze the mechanism through which feature learning happens and use a notion of features that corresponds to discontinuities in the true label function. We hypothesize that the core feature learning mechanism is label function discontinuities attracting model function discontinuities during training. To test this hypothesis, we perform experiments on classification data where the true label function is given by an oblique decision tree. This setup allows easy enumeration of label function discontinuities, while still remaining intractable for static kernel/linear methods. We then design/construct a novel deep architecture called a Deep Linearly Gated Network (DLGN), whose discontinuities in the input space can be easily enumerated. In this setup, we provide supporting evidence demonstrating the movement of model function discontinuities towards the label function discontinuities during training. The easy enumerability of discontinuities in the DLGN also enables greater mechanistic interpretability. We demonstrate this by extracting the parameters of a high-accuracy decision tree from the parameters of a DLGN. We also show that the DLGN is competitive with ReLU networks and other tree-learning algorithms on several real-world tabular datasets.

Supplementary Material: zip

Primary Area: interpretability and explainable AI

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 1259

Loading