Abstract: Scalable graph embedding on large networks is challenging because of the complexity of graph structures and limited computing resources. Recent research shows that the multi-level framework can enhance the scalability of graph embedding methods with little loss of quality. In general, methods using this framework first coarsen the original graph into a series of smaller graphs then learn the representations of the original graph from them in an efficient manner. However, to the best of our knowledge, most multi-level based methods do not have a parallel implementation. Meanwhile, the emergence of high-performance computing for machine learning provides an opportunity to boost graph embedding by distributed computing. In this paper, we propose a Distributed MultI -Level Embedding (DistMILE <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> Our code is available at https://github.com/heyuntian/DistMILE) framework to further improve the scalability of graph embedding. DistMILE leverages a novel shared-memory parallel algorithm for graph coarsening and a distributed training paradigm for embedding refinement. With the advantage of high-performance computing techniques, Dist-MILE can smoothly scale different base embedding methods over large networks. Our experiments demonstrate that DistMILE learns representations of similar quality with respect to other baselines, while reduces the time of learning embeddings on large-scale networks to hours. Results show that DistMILE can achieve up to 28 x speedup compared with a popular multi-level embedding framework MILE and expedite existing embedding methods with 40 x speedup.
0 Replies
Loading