Abstract: To tackle the high-dimensional data with multiple representations, multi-view unsupervised feature selection has emerged as a significant learning paradigm.
However, previous methods suffer from the following dilemmas:
(i) The emphasis is on selecting features to preserve the similarity structure of data, while neglecting the discriminative information in the cluster structure;
(ii) They often impose the orthogonal constraint on the pseudo cluster labels, disrupting the locality in the cluster label space;
(iii) Learning the similarity or cluster structure from all samples is likewise time-consuming.
To this end, a Scalable Multi-view Unsupervised Feature Selection with structure learning and fusion (SMUFS) is proposed to jointly exploit the cluster structure and the similarity relations of data.
Specifically, SMUFS introduces the sample-view weights to adaptively fuse the membership matrices that indicate cluster structures and serve as the pseudo cluster labels, such that a unified membership matrix across views can be effectively obtained to guide feature selection. Meanwhile, SMUFS performs graph learning from the membership matrix, preserving the locality of cluster labels and improving their discriminative capability.
Further, an acceleration strategy has been developed to make SMUFS scalable for relatively large-scale data.
A convergent solution is devised to optimize the formulated problem, and extensive experiments demonstrate the effectiveness and superiority of SMUFS.
Primary Subject Area: [Content] Multimodal Fusion
Secondary Subject Area: [Content] Multimodal Fusion
Relevance To Conference: Multimodal fusion has become an important task with the development of data acquisition technology. Considering real-world data collected from practical applications typically contain irrelevant features and are often high-dimensional, direct processing can be inefficient and potentially degrading performance.
Therefore, this paper focuses on multi-view unsupervised feature selection, aiming to select a compact subset of salient features without the guidance of labels and facilitating the successor task (e.g., clustering). Existing feature selection methods suffer from insufficient utilization of clustering information and are plagued by the issue of excessive time complexity.
To this end, our proposed method simultaneously explores the clustering structure and similarity structure to enhance the quality of the selected features, and we further design an acceleration strategy to extend its applicability to large datasets. Our proposed method tactfully integrates the information from multiple views and addresses some problems within the domain of multi-view feature selection.
Supplementary Material: zip
Submission Number: 3086
Loading