Reducing the Search Space for Optimal Clustering Parameters Using a Small Amount of Labeled Data

Published: 31 Dec 2023, Last Modified: 27 Jan 2026Scientific and Technical Information ProcessingEveryoneCC BY 4.0
Abstract: This article presents a method for reducing the search space of clustering parameters. This is achieved by selecting the most appropriate data transformation methods and dissimilarity measures at the stage preceding the actual execution of clustering. To compare the selected methods, it is proposed to use the silhouette coefficient, which considers class labels from a small labeled dataset as cluster labels. The results of experimental validation of the proposed approach for clustering news texts are presented.
Loading