Abstract: Multi-view representation learning methods typically follow a consistent-and-specific pipeline that aims at extracting latent representations for an entity from its multiple observable views to facilitate downstream tasks. However, most of them overlook the complex underlying correlation between different views. To solve this issue, we delve into a well-known property of neural networks (NNs) that NNs tend to learn simple patterns first and then hard ones. In our case, view-consistent representations are simple patterns and view-specific representations are hard. To this end, we propose to disentangle view-consistency and view-specificity and learn them gradually. Specifically, we devise a novel curriculum learning approach that adjusts the whole model to learn view-consistent representations first and then progressively view-specific representations. Besides, we saddle each view with a learnable prior that allows each view-specific representation to appropriate its distribution. Moreover, we incorporate a mixture-of-experts layer and a disentangling module to further enhance the quality of the learned representations. Extensive experiments on five real-world datasets show that the proposed model outperforms its counterparts markedly. The code is available at https://github. com/XLearning-SCU/2025-IJCAI-CL2P.
Loading