Position: A Theory of Deep Learning Must Include Compositional Sparsity

David A. Danhofer; Davide D'Ascenzo; Rafael Dubach; Tomaso A Poggio

Position: A Theory of Deep Learning Must Include Compositional Sparsity

David A. Danhofer, Davide D'Ascenzo, Rafael Dubach, Tomaso A Poggio

Published: 01 May 2025, Last Modified: 16 Aug 2025ICML 2025 Position Paper Track posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We posit that compositional sparsity, a property of all relevant functions, is the key to understand the capabilities of deep learning to approximate, optimize and generalize, but the theoretical foundation requires further theoretical study.

Abstract: Overparametrized Deep Neural Networks (DNNs) have demonstrated remarkable success in a wide variety of domains too high-dimensional for classical shallow networks subject to the curse of dimensionality. However, open questions about fundamental principles, that govern the learning dynamics of DNNs, remain. In this position paper we argue that it is the ability of DNNs to exploit the compositionally sparse structure of the target function driving their success. As such, DNNs can leverage the property that most practically relevant functions can be composed from a small set of constituent functions, each of which relies only on a low-dimensional subset of all inputs. We show that this property is shared by all efficiently Turing-computable functions and is therefore highly likely present in all current learning problems. While some promising theoretical insights on questions concerned with approximation and generalization exist in the setting of compositionally sparse functions, several important questions on the learnability and optimization of DNNs remain. Completing the picture of the role of compositional sparsity in deep learning is essential to a comprehensive theory of artificial—and even general—intelligence.

Lay Summary: A central mystery in artificial intelligence is why deep learning works so well, even on extremely complex problems. This paper argues that one of the key secrets lies in *compositional sparsity*: most real-world tasks can be broken down into many small, simple steps, each depending on only a few pieces of information. Deep neural networks are especially good at learning these kinds of step-by-step structures, which lets them avoid the usual explosion in complexity that plagues traditional methods. We demonstrate that this property is shared by all efficiently computable problems—that is, problems that can be solved efficiently by computers, and explain how it helps deep learning systems to learn, generalize, and reason. However, important questions remain—such as how neural networks discover these hidden structures from training data, and what makes some problems easier to learn than others. Understanding these principles could help us design smarter and more reliable AI systems in the future.

Verify Author Names: My co-authors have confirmed that their names are spelled correctly both on OpenReview and in the camera-ready PDF. (If needed, please update ‘Preferred Name’ in OpenReview to match the PDF.)

No Additional Revisions: I understand that after the May 29 deadline, the camera-ready submission cannot be revised before the conference. I have verified with all authors that they approve of this version.

Pdf Appendices: My camera-ready PDF file contains both the main text (not exceeding the page limits) and all appendices that I wish to include. I understand that any other supplementary material (e.g., separate files previously uploaded to OpenReview) will not be visible in the PMLR proceedings.

Latest Style File: I have compiled the camera ready paper with the latest ICML2025 style files <https://media.icml.cc/Conferences/ICML2025/Styles/icml2025.zip> and the compiled PDF includes an unnumbered Impact Statement section.

Paper Verification Code: OTY1Y

Permissions Form: pdf

Primary Area: Research Priorities, Methodology, and Evaluation

Keywords: deep learning, neural networks, compositional sparsity, hierarchical learning, approximation, optimization, generalization, curse of dimensionality, chain-of-thought, transformers, large language models

Submission Number: 186

Loading