Keywords: black-box neural network
Abstract: Despite the considerable advancements in Deep Neural Networks (DNNs), their intrinsic opacity remains a challenge from their foundational design.
In this study, we elucidate a novel
phenomenon wherein the representation of cumulative gradients (the
aggregate changes in iterative gradients) exhibits a certain
independence from the initial computation point of the gradients.
This implies that learned gradients can be assigned to other
arbitrarily initialized yet well-trained neural networks, while
retaining a comparable representation to the original network.
This suggests that the cumulative gradients can be assigned to other arbitrarily initialized but adequately trained neural networks, maintaining a representation like the original one.
This occurrence is counterintuitive and can not be well explained via existing optimization theories.
Additionally, we observe that the learned model weights can also be
reassigned to different neural networks.
In essence, these learned gradients can be viewed as a neural network with analogous representations.
Futhermore, this reassignment of gradients and model weights can potentially mitigate catastrophic forgetting when learning multi-tasks. We provide a theoretical framework to support this claim. Our extensive experiments clearly illustrate this phenomenon and its potential to mitigate catastrophic forgetting.
Primary Area: learning theory
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7466
Loading