Weighted Subspace Graph Learning for High-Dimensional Data

Guojie Li, Zhiwen Yu, Ziwei Fan, Kaixiang Yang, C. L. Philip Chen

Published: 01 Jan 2026, Last Modified: 22 Jan 2026IEEE Transactions on Knowledge and Data EngineeringEveryoneRevisionsCC BY-SA 4.0

Abstract: Graph-based clustering has been extensively explored and applied due to its exceptional performance. However, most existing methods operate directly in the original high-dimensional space, where complex nonlinear structures and redundant noisy features often obscure the intrinsic data distribution. Consequently, constructing a reliable similarity graph in such a space is inherently challenging, as uncertainty and noise can significantly degrade clustering performance. To address this issue, this paper proposes a novel graph-based clustering method, Weighted Subspace Graph Learning (WSGL). Specifically, WSGL leverages kernel principal component analysis (Kernel PCA) to construct multiple kernel-based subspaces, effectively capturing nonlinear structures while reducing redundancy and noise. This strategy enhances subspace features from different perspectives, providing a more comprehensive understanding of the data distribution. Next, WSGL learns pairwise relationships across these subspaces, fully exploiting their complementary information to mitigate the limitations of relying on a single original space for capturing the global data structure. Furthermore, to ensure that the learned similarity graph preserves the same number of connected components as the ground-truth clusters, we impose a low-rank constraint on the graph structure. Additionally, considering the varying quality of different subspaces, WSGL introduces a dynamic weighting mechanism that adaptively assigns weights to subspaces based on their contribution to clustering performance, allowing high-quality subspaces to play a more dominant role in the final clustering results. Extensive experiments on multiple high-dimensional datasets demonstrate that WSGL surpasses state-of-the-art methods, validating its effectiveness and superiority in complex clustering tasks.

External IDs:doi:10.1109/tkde.2026.3656436