LMCC-MBC: Metric-Constrained Model-Based Clustering with Wasserstein-2 Distance of Gaussian Markov Random Fields

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: general machine learning (i.e., none of the above)
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: unsupervised learning, clustering, model-based clustering, metric-constrained clustering
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: A wide range of temporal (1D) and spatial (2D) data analysis problems can be formulated as model-based clustering problems given metric constraints. For example, subsequence clustering of multivariate time series is constrained by 1D temporal continuity, while urban functional area identification is constrained by the spatial proximity in the 2D space. Existing works model such metric constraints independent of the model estimation process, failing to leverage the correlation between adjacent estimated models and their locations in the metric space. To solve this problem we propose a novel metric-constrained model-based clustering algorithm LMCC-MBC that softly requires the Wasserstein-2 distance between estimated model parameters (such as those of Gaussian Markov Random Fields) to be a locally monotonic continuous function of the metric distance. We theoretically prove that satisfaction of this requirement guarantees intra-cluster cohesion and inter-cluster separation. Moreover, without explicitly optimizing log-likelihood LMCC-MBC voids the expensive EM-step that is needed by previous approaches (e.g., TICC and STICC), and enables faster and more stable clustering. Experiments on both 1D and 2D synthetic as well as real-world datasets demonstrate that our algorithm successfully captures the latent correlation between the estimated models and the metric constraints, and outperforms strong baselines by a margin up to 14.3% in ARI (Adjusted Rand Index) and 32.1% in NMI (Normalized Mutual Information).
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6618
Loading