Breaking Neural Network Scaling Laws with Modularity

Akhilan Boopathy; Sunshine Jiang; William Yue; Jaedong Hwang; Abhiram Iyer; Ila R Fiete

Breaking Neural Network Scaling Laws with Modularity

Akhilan Boopathy, Sunshine Jiang, William Yue, Jaedong Hwang, Abhiram Iyer, Ila R Fiete

22 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: general machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: modularity, neural network, generalization, high-dimensional, scaling laws

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: Non-modular neural networks require an exponential number of training samples with task dimensionality; we break this barrier using modular neural networks.

Abstract: Modular neural networks outperform non-modular neural networks on tasks ranging from visual question answering to robotics. These performance improvements are thought to be due to modular networks' superior ability to model the compositional and combinatorial structure of real-world problems. However, a theoretical explanation of how modularity improves generalizability, and how to leverage task modularity while training networks remains elusive. Using recent theoretical progress in explaining neural network generalization, we investigate how the amount of training data required to generalize on a task varies with the intrinsic dimensionality of a task's input. We show theoretically that when applied to modularly-structured tasks, while non-modular networks require an {exponential} number of samples with task dimensionality, modular networks' sample complexity is {independent} of task dimensionality: modular networks can generalize in high dimensions. We then develop a novel learning rule for modular networks to exploit this advantage and empirically show the rule's improved generalization, both in and out of distribution, on high-dimensional, modular tasks.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

Supplementary Material: zip

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6027

Loading