Deep Equilibrium Multimodal Fusion

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Multimodal Fusion, Multimodal Learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Multimodal fusion integrates the complementary information present in multiple modalities and has gained much attention recently. Existing fusion approaches exhibit three key elements for informative multimodal fusion, *i.e.*, stabilizing unimodal signals, capturing intra- and inter-modality interactions at multi-level, and perceiving modality importance in a dynamic manner. The current fusion methods mostly suffice only one of these conditions, without considering all three aspects simultaneously. Encapsulating these ideas, in this paper, we propose a novel deep equilibrium (DEQ) method for multimodal fusion via seeking a fixed point of the dynamic multimodal fusion process and modeling feature correlations in an adaptive and recursive manner, which naturally consolidates the three key ingredients for successful multimodal fusion. Our approach encodes and stabilizes rich information within and across modalities thoroughly from low level to high level and dynamically perceives modality importance for efficacious downstream multimodal learning, and is readily pluggable to various multimodal frameworks. Extensive experiments on four well-known multimodal benchmarks, namely, BRCA, MM-IMDB, CMU-MOSI, and VQA-v2, involving a vast variety of modalities, demonstrate the superiority and generalizability of our DEQ fusion. Remarkably, our DEQ fusion consistently achieves state-of-the-art performance on these benchmarks. The code will be released.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6765
Loading