Application of Pseudometric Functions in Clustering and a Novel Similarity Measure Based on Path Information Discrepancy

Haochen You, Baojing Liu

Published: 01 Jan 2025, Last Modified: 06 Jan 2026CrossrefEveryoneRevisionsCC BY-SA 4.0

Abstract: Similarity measures are widely used in machine learning, and people generally require them to satisfy the definition of a metric function. Even if some researches have ventured beyond the constraints of metric functions, it is only limited to intuitive perception oriented toward results or the application of single non-metric functions, lacking systematic and abstract general research. The violation of certain properties of metric functions makes similarity measures more flexible but also brings a lot of uncertainty. In this paper, we mainly focus on the violation of the Identity. We define a feature variable, \(\lambda -\)Scaling Index, to measure the identity differences under different similarity functions, and theoretically demonstrate its quantitative relationship with mainstream clustering performance metrics. We also distill the core idea of DBSCAN and introduce a new similarity measure function, which is also a pseudometric function, the Steepness Value, based on path information discrepancy. Numerical experiments on real image datasets have verified the actual prior effect of the new \(\lambda -\)Scaling Index and the accuracy of upper bound estimation, and shown the new similarity function can significantly improve the clustering performance of DBSCAN, showing promising application prospects.

External IDs:doi:10.1007/978-981-96-6579-2_5