Learning-Augmented Hierarchical Clustering

Vladimir Braverman; Jon C. Ergun; Chen Wang; Samson Zhou

Learning-Augmented Hierarchical Clustering

Vladimir Braverman, Jon C. Ergun, Chen Wang, Samson Zhou

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We gave hierarchical clustering algorithms with learning-augmented oracles with bounds overcoming known hardness results.

Abstract: Hierarchical clustering (HC) is an important data analysis technique in which the goal is to recursively partition a dataset into a tree-like structure while grouping together similar data points at each level of granularity. Unfortunately, for many of the proposed HC objectives, there exist strong barriers to approximation algorithms with the hardness of approximation. We consider the problem of hierarchical clustering given auxiliary information from natural oracles in the learning-augmented framework. Our main results are algorithms that, given learning-augmented oracles, compute efficient approximate HC trees for the celebrated Dasgupta's and Moseley-Wang objectives that overcome known hardness barriers.

Lay Summary: We studied the hierarchical clustering (HC) problem, which aims to summarize data into increasingly precise sub-clusters based on similarity and dissimilarity. The optimization of hierarchical clustering is a major problem in machine learning with applications in artificial intelligence and medical science. Since the problem is hard in general, we considered a learning-augmented version that takes advantage of a (potentially erroneous) oracle obtained from machine learning models. This oracle can provide some 'hints' to the HC algorithm in the hope of getting improved results. Our paper showed that it is indeed possible to obtain such improved HC algorithms with the learning-augmented oracle. Our results shed light on advancing the optimization of HC using machine learning models, and these algorithms have the potential to be applied to a broad range of applications, e.g., in medical modeling.

Primary Area: Theory

Keywords: hierarchical clustering, learning-augmented algorithms

Submission Number: 9204

Loading