Dynamic Multi-Modal Representation Learning For Topic Modeling

Hongzhang Mu, Shuili Zhang, Quangang Li, Tingwen Liu, Hongbo Xu

Published: 2024, Last Modified: 21 Jan 2026ICME 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Topic modeling aims to identify and group related topics within collections of text documents. As multi-modal data becomes increasingly effective and prevalent, researchers are delving into the integration of diverse data types (e.g., images) into topic modeling. However, existing methods for topic modeling struggle with unrelated modal features (e.g., visual features) and efficiency. In this paper, we propose a dynamic multi-modal representation learning method (DMMR) that adaptively integrates multi-modal features to enhance the effectiveness and efficiency of modeling multi-modal data. Concretely, a gating network controls the modality-level decision to choose text, image, or both. Based on the sample-wise choice predicted by the gating network, DMMR performs the fusion of multi-modal features encoded by modality-related expert network. With the dynamic modality selection and fusion, the representative modal features can be chosen to exclude irrelevant modality and speed the inference. Extensive experiments on public datasets demonstrate that the proposed method significantly improves the topic quality (e.g., coherence and diversity) and increases the efficiency by 20.37%.