A Knowledge-Driven Approach to Enhance Topic Modeling with Multi-Modal Representation Learning

Published: 01 Jan 2024, Last Modified: 22 Jul 2025ICMR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: multi-modal topic models strive to integrate semantic information from multi-modal data to generate more precise topics. Topic modeling methods encounter challenges in terms of topic diversity and effectiveness. To address this issue, the majority of current approaches focus on modeling the correlation among numerous multi-modal sources. Nevertheless, little emphasis has been placed on fine-grained feature representation and structured knowledge. In this regard, we propose a fine-grained Prompt representation method. Specifically, we adopt a dual-stream structure where a pre-trained language model and an image model are parallelly combined to construct a multi-modal model. We then enhance the structured representation by integrating fine-grained scene graph knowledge through a Knowledge-Enhanced Encoder, which is constructed based on the scene graph. To validate the effectiveness of the proposed framework, we significantly improve topic quality (such as coherence and diversity) using the aforementioned approach. On publicly available datasets, our approach outperforms state-of-the-art multi-modal topic models respectively.
Loading