How Can Knowledge of a Task’s Modular Structure Improve Generalization and Training Efficiency?

Published: 18 Aug 2025, Last Modified: 18 Aug 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Many real-world learning tasks have an underlying hierarchical and modular structure, composed of smaller sub-functions. Traditional neural networks (NNs) often disregard this structure, leading to inefficiencies in learning and generalization. Prior work has demonstrated that leveraging known structural information can enhance performance by aligning NN architectures with the task’s inherent modularity. However, the extent of prior structural knowledge required to achieve these performance improvements remains unclear. In this work, we investigate how modular NNs can outperform traditional dense NNs on tasks with simple yet known modular structure by systematically varying the degree of structural knowledge incorporated. We compare architectures ranging from monolithic dense NNs, which assume no prior knowledge, to hierarchically modular NNs with shared modules that leverage sparsity, modularity, and module reusability. Our experiments demonstrate that module reuse in modular NNs significantly improves learning efficiency and generalization. Furthermore, we find that module reuse enables modular NNs to excel in data-scarce scenarios by promoting functional specialization within modules and reducing redundancy.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: We thank the reviewers for their feedback, which helped improve the paper. The changes include: 1. Introduction - Revised to position the work as a systematic analysis of what kind and how much structural knowledge benefits neural networks, going beyond the general principle that “structure helps.” 2. Introduction and conclusion - Explicitly acknowledged the study’s scope, stating the use of simple functions with fully known hierarchical and modular structure. 3. Introduction - Clarified early on that all architectures, including modules in hierarchically modular networks, are implemented as MLPs. 4. Introduction - Added justification for the use of simple functions, emphasizing that this design choice enables controlled evaluation of sparsity, modularity, and module reuse without confounding factors. 5. Section 2.1 – Expanded to include a definition of truth tables alongside the description of function graphs. 6. Figure 4 – Modified per reviewer feedback to add dense monolithic NN results to the second row for direct comparison. 7. Appendix Section B – Clarified the definitions and GPU implementation details for sparse neural networks and hierarchically modular networks, explicitly noting computational considerations.
Code: https://github.com/ShreyasMalakarjunPatil/modular-NNs
Assigned Action Editor: ~Gintare_Karolina_Dziugaite1
Submission Number: 4725
Loading