Keywords: Physics-inspired, Multimodal Fusion, Poisson-Nernst-Planck Equation, Optimization Objective Redefinition, Feature Disentanglement, Downstream Task Adaptation
TL;DR: A generalized multimodal fusion model via Poisson-Nernst-Planck Equation, which improve the fusion performance.
Abstract: Despite recent progress, current multimodal fusion methods still face three practical issues: gradient interference between task and fusion objectives, fragility under missing modalities, and rigidity from enforcing uniform feature dimensions across modalities.
We present Physics-Inspired Multimodal Reconfiguration (PMR), a Poisson–Nernst–Planck (PNP)–inspired structured prior for fusion. Drawing from the principles of conservation and single-potential-driven flow, PMR embeds these as (i) an information-preservation regularizer and (ii) a unified scalar potential that shapes gradient updates, mitigating interference between task and fusion objectives. This unified potential drives disentanglement of shared and modality-specific subspaces. A three-stage mapping (dissolution → dissociation → concentration) instantiates the prior to separate and reconstruct features, improving robustness to missing modalities and naturally supporting unequal feature dimensions.
Across audio, image, video, and text, PMR consistently outperforms competitive baselines on classification and cross-modal retrieval, demonstrating the efficacy of a physics-inspired hybrid prior for multimodal learning.
Supplementary Material: zip
Primary Area: neurosymbolic & hybrid AI systems (physics-informed, logic & formal reasoning, etc.)
Submission Number: 4638
Loading