Multimodal Meta-learning of Implicit Neural Representations with Iterative Adaptation

19 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: transfer learning, meta learning, and lifelong learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Implicit Neural Representations, Meta Learning, Multimodal Learning, Optimization
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Gradient-based meta-learning is gaining prominence in implicit neural representations (INRs) to accelerate convergence. However, existing approaches primarily concentrate on meta-learning $\textit{weight initialization}$ for $\textit{unimodal}$ signals. This focus falls short when data is scarce, as noisy gradients from a small set of observations can hinder convergence and trigger overfitting. Moreover, real-world data often stems from joint multimodal distributions, which share common or complementary information across modalities. This presents an opportunity to enhance convergence and performance, particularly when dealing with limited data. Unfortunately, existing methods do not fully exploit this potential due to their main focus on unimodal setups. In this work, we introduce a novel optimization-based meta-learning framework, Multimodal Iterative Adaptation (MIA), that addresses these limitations. MIA fosters continuous interaction among independent unimodal INR learners, enabling them to capture cross-modal relationships and refine their understanding of signals through iterative optimization steps. To achieve this goal, we introduce additional meta-learned modules, dubbed State Fusion Transformers (SFTs). Our SFTs are meta-learned to aggregate the states of the unimodal learners ($\textit{e.g.}$ parameters and gradients), capture their potential cross-modal interactions, and utilize this knowledge to provide enhanced weight updates and guidance to the unimodal learners. In experiments, we demonstrate that MIA significantly improves the modeling capabilities of unimodal meta-learners, achieving substantial enhancements in generalization and memorization performances over unimodal baselines across a variety of multimodal signals, ranging from 1D synthetic functions to real-world vision, climate, and audiovisual data.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1819
Loading