Keywords: scalable algorithm, time complexity, space complexity, large-scale data, tensor clustering, seeded clustering
TL;DR: Develop a large-scale multiway clustering framework that substantially reduces computational costs without much accuracy sacrifices.
Abstract: Multiway clustering methods for higher-order tensor observations have been developed in various fields, including recommendation systems, neuroimaging, and social networks. However, high computational costs hinder the applications of tensor-based approaches to real-world large-scale data. Here, we propose a large-scale multiway clustering framework under tensor block model, named LS-TBM, with accuracy guarantees. LS-TBM leverages seeded clustering to break down the expensive high-dimensional tensor clustering into two fast low-dimensional steps. Our two-step algorithm substantially reduces the time and space complexities from polynomial to logarithmic rates while maintaining the exact recovery of community structures, under certain signal conditions. We also establish the theoretical phase transition of LS-TBM performance with a key interplay between signal levels and seed sizes. Numerical experiments with synthetic data and real large-scale Uber Pickup data highlight LS-TBM's superior performance in practice.
Submission Number: 78
Loading