Abstract: Multimodal learning benefits from multiple modal information, and
each learned modal representations can be divided into uni-modal
that can be learned from uni-modal training and paired-modal features that can be learned from cross-modal interaction. Building
on this perspective, we propose a partitioner-guided modal learning framework, PgM , which consists of the modal partitioner,
uni-modal learner, paired-modal learner, and uni-paired modal decoder. Modal partitioner segments the learned modal representation
into uni-modal and paired-modal features. Modal learner incorporates two dedicated components for uni-modal and paired-modal
learning. Uni-paired modal decoder reconstructs modal representation based on uni-modal and paired-modal features. PgM offers
three key benefits: 1) thorough learning of uni-modal and pairedmodal features, 2) flexible distribution adjustment for uni-modal
and paired-modal representations to suit diverse downstream tasks,
and 3) different learning rates across modalities and partitions. Extensive experiments demonstrate the effectiveness of PgM across
four multimodal tasks and further highlight its transferability
to existing models. Additionally, we visualize the distribution of
uni-modal and paired-modal features across modalities and tasks,
offering insights into their respective contributions
Loading