Large-Scale Multiway Clustering with Seeded Clustering

Published: 11 Feb 2025, Last Modified: 20 Mar 2025CPAL 2025 (Proceedings Track) PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: scalable algorithm, time complexity, space complexity, large-scale data, tensor clustering, seeded clustering
TL;DR: Develop a large-scale multiway clustering framework that substantially reduces computational costs without much accuracy sacrifices.
Abstract: Multiway clustering methods for higher-order tensor observations have been developed in various fields, including recommendation systems, neuroimaging, and social networks. However, high computational costs hinder the applications of tensor-based approaches to real-world large-scale data. Here, we propose a large-scale multiway clustering framework under tensor block model, named LS-TBM, with accuracy guarantees. LS-TBM leverages seeded clustering to break down the expensive high-dimensional tensor clustering into two fast low-dimensional steps. Our two-step algorithm substantially reduces the time and space complexities from polynomial to logarithmic rates while maintaining the exact recovery of community structures, under certain signal conditions. We also establish the theoretical phase transition of LS-TBM performance with a key interplay between signal levels and seed sizes. Numerical experiments with synthetic data and real large-scale Uber Pickup data highlight LS-TBM's superior performance in practice.
Submission Number: 78
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview