Abstract: Clustering as a fundamental technique in data mining and machine learning, aims to partition data into meaningful groups based on the inherent relationships among data. However, traditional clustering algorithms typically assume convex hyperspherical geometry of data, where the clusters have clearly defined boundaries and do not overlap. In contrast, real-world data often exhibits complex and non-convex geometries, which makes these assumptions ineffective and lead to inaccurate clustering results that fail to capture the intrinsic structure. To address this challenge, the paper proposes a novel granular clustering based on an enhanced granularity representation, which further refines the principle of justifiable granularity. By introducing a more precise and flexible hyper-box granulation mechanism, the method dynamically adapts to the topology of data, thereby improving clustering accuracy. By defining the degree of aggregation and discreteness between data points, the importance of attributes in the feature space is quantified, leading to the design of a novel hyper-box feature selection (HBFS) algorithm. This algorithm integrates the granular clustering principle to optimize the feature selection process, reducing the impact of redundant features and noise, thus improving clustering efficiency and interpretability. To validate the superiority and effectiveness of the proposed method, extensive experiments were conducted on fifteen publicly available datasets, comparing the performance of HBFS algorithm with classical and state-of-art feature selection methods. The results and the statistical significance tests show that HBFS significantly outperforms existing feature selection methods across various evaluation metrics.
External IDs:dblp:journals/tkde/LiYPZZ25
Loading