Block-operations: Creating an Inductive Bias to Route Data and Reuse Subnetworks

Florian Dietz; Dietrich Klakow

Block-operations: Creating an Inductive Bias to Route Data and Reuse Subnetworks

Florian Dietz, Dietrich Klakow

18 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Supplementary Material: zip

Primary Area: transfer learning, meta learning, and lifelong learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Neural Network Architecture, FNN, Data Representation, Generalization, Routing, Inductive Bias, Negative interference

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: A novel mechanism for grouping neurons and a module based on it, which results in more consistent data representation, less negative interference, better transfer learning.

Abstract: Feed Forward Neural Networks (FNNs) often suffer from poor generalization due to their inability to effectively develop and reuse subnetworks for related tasks. Csordás et al. (2020) suggest that this may be because FNNs are more likely to learn new mappings than to copy and route activation patterns without altering them. To tackle this problem, we propose the concept of block-operations: Learnable functions that group neurons into larger semantic units and operate on these blocks, with routing as a primitive operation. As a first step, we introduce the Multiplexer, a new architectural component that enhances the FNN by adding block-operations to it. We experimentally verified that the Multiplexer exhibits several desirable properties, as compared to the FNN which it replaces: It represents concepts consistently with the same neuron activation patterns throughout the network, suffers less from negative interference, shows an increased propensity for specialization and transfer learning, can more easily reuse learned subnetworks for new tasks, and is particularly effective at learning algorithmic tasks with conditional logics. In several cases, the Multiplexer achieved 100% OOD-generalization on our tasks, where FNNs only learned correlations that failed to generalize. Our results suggest that block-operations are a promising direction for future research. Adapting more complex architectures than the FNN to make use of them could lead to increased compositionality and better generalization.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 1239

Loading