TL;DR: This paper proposes ParPIC, a method for clustering nodes in directed graphs by comparing t-step random-walk transition probabilities and grouping nodes with similar diffusion profiles into the same cluster.
Abstract: Vertex-level clustering for directed graphs (digraphs) remains challenging as edge directionality breaks the key assumptions underlying popular spectral methods, which also incur the overhead of eigen-decomposition. This paper proposes *Parametrized Power Iteration Clustering* (ParPIC), a random-walk-based clustering method for weakly connected digraphs. This builds over the Power-Iteration Clustering paradigm, which uses the rows of the iterated diffusion operator as a data embedding. ParPIC has three important features: the use of parametrized reversible random walk operators, the automatic tuning of the diffusion time, and the efficient truncation of the final embedding, which produces low-dimensional data representations and reduces complexity. Empirical results on synthetic and real-world graphs demonstrate that ParPIC achieves competitive clustering accuracy with improved scalability relative to spectral and teleportation-based methods.
Lay Summary: Clustering nodes in directed networks, where connections have a direction, like follower relationships on social media or links between web pages, is difficult because many existing algorithms are designed for undirected networks. Existing methods are often computationally expensive for large graphs. In addition, some approaches simplify directed networks by ignoring edge directionality, which can lead to the loss of important structural information.
This paper introduces a new method called Parametrized Power Iteration Clustering (ParPIC) for grouping related nodes in directed graphs. Instead of using costly spectral techniques, ParPIC represents each node by its transition probabilities after random-walk steps. Nodes whose probability distributions are similar (meaning they tend to reach the same parts of the graph) are grouped into the same cluster. The method includes three main improvements: it can adapt the random walk process through tunable parameters, it automatically determines how long the diffusion process should run, and it compresses the resulting data representation to make computations more efficient.
Tests on both artificial and real-world networks show that ParPIC can match the clustering quality of existing methods while scaling better to large datasets and reducing computational cost.
Link To Code: https://github.com/Gwendal-Debaussart/parpic
Primary Area: General Machine Learning->Clustering
Keywords: Clustering, Diffusion geometry, Power-Iteration, time-selection, random walk clustering, directed graphs
Originally Submitted PDF: pdf
Submission Number: 1109
Loading