Deep multi-view document clustering with enhanced semantic embedding

Published: 01 Jan 2021, Last Modified: 30 Sept 2024Inf. Sci. 2021EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Multi-view clustering, which aims to group data with multiple views, has recently attracted intense research attention. Text documents bring additional difficulties to multi-view clustering due to the sparseness, high dimensionality, and inconsistency of document views. In this paper, we introduced a novel model on multi-view document clustering with enhanced semantic embedding, namely, MDCE, to address all of the above difficulties of clustering text documents with more than one representation view. Enhanced semantic embedders are designed to learn and improve the semantic mapping from higher-dimensional document space to lower-dimensional feature space with complementary semantic information. Specifically, three types of complementary semantic information are involved in an unsupervised manner: neighbour-wise, view-wise, and cluster-wise complementary information. A deep network is designed to optimize the enhanced semantic mapping, integrate lower-dimensional features from multiple views, and discover document clustering assignments simultaneously. We conducted extensive experiments on our proposed MDCE model by using realistic datasets compared with a number of state-of-the-art multi-view clustering approaches. Experimental results demonstrate that the MDCE-related models perform substantially better than all other models.
Loading