A Biselection Method Based on Consistent Matrix for Large-Scale Datasets

Published: 01 Jan 2025, Last Modified: 07 Jul 2025IEEE Trans. Fuzzy Syst. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Biselection (feature and sample selection) enhances the efficiency and accuracy of machine learning models when handling large-scale data. Fuzzy rough sets, an uncertainty mathematical model known for its excellent interpretability, are widely used in machine learning, particularly for feature selection. While the consistent matrix has significantly improved the computational efficiency and scalability of feature selection, like most fuzzy rough set-based methods, it focuses only on feature selection and seldom incorporates sample selection. This feature-centric approach can limit classification performance, particularly in noisy and large-scale datasets where both features and samples require judicious selection. To overcome these limitations, this article explores the integration of sample selection with feature selection. First, we introduce a $\beta$-consistent granulation method to generate more accurate and concise fuzzy information granules. In addition, a novel membership function is employed to distinguish noise samples and irrelevant features simultaneously. As a result, a biselection algorithm with lower computational complexity is proposed to select high-quality features and samples. Numerical experiments demonstrate that, compared to eleven representative algorithms, our proposed method achieves an average accuracy improvement of 9.66% and a 933-fold increase in efficiency.
Loading