Random-Walk Based Approximate k-Nearest Neighbors Algorithm for Diffusion State Distance

Lenore J. Cowen, Xiaozhe Hu, Junyuan Lin, Yue Shen, Kaiyi Wu

Published: 2021, Last Modified: 15 May 2025LSSC 2021EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Diffusion State Distance (DSD) is a data-dependent metric that compares data points using a data-driven diffusion process and provides a powerful tool for learning the underlying structure of high-dimensional data. While finding the exact nearest neighbors in the DSD metric is computationally expensive, in this paper, we propose a new random-walk based algorithm that empirically finds approximate k-nearest neighbors accurately in an efficient manner. Numerical results for real-world protein-protein interaction networks are presented to illustrate the efficiency and robustness of the proposed algorithm. The set of approximate k-nearest neighbors performs well when used to predict proteins’ functional labels.