Dynamic Algorithm for Explainable $k$-medians Clustering under $\ell_p$ Norm

Konstantin Makarychev; Ilias Papanikolaou; Liren Shan

Dynamic Algorithm for Explainable $k$-medians Clustering under $\ell_p$ Norm

Konstantin Makarychev, Ilias Papanikolaou, Liren Shan

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 spotlightEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Explainable Clustering, k-medians, Decision Tree

TL;DR: We present the first algorithm for explainable $k$-medians under $\ell_p$ norm for every finite $p \geq 1$ and show how to implement our algorithm in a dynamic setting.

Abstract: We study the problem of explainable $k$-medians clustering introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian (2020). In this problem, the goal is to construct a threshold decision tree that partitions data into $k$ clusters while minimizing the $k$-medians objective. These trees are interpretable because each internal node makes a simple decision by thresholding a single feature, allowing users to trace and understand how each point is assigned to a cluster. We present the first algorithm for explainable $k$-medians under $\ell_p$ norm for every finite $p \geq 1$. Our algorithm achieves an $\tilde{O}\big(p(\log k)^{1 + 1/p - 1/p^2}\big)$ approximation to the optimal $k$-medians cost for any $p \geq 1$. Previously, algorithms were known only for $p = 1$ and $p = 2$. For $p = 2$, our algorithm improves upon the existing bound of $\tilde O(\log^{3/2}k)$, and for $p = 1$, it matches the tight bound of $\log k + O(1)$ up to a multiplicative $O(\log \log k)$ factor. We show how to implement our algorithm in a dynamic setting. The dynamic algorithm maintains an explainable clustering under a sequence of insertions and deletions, with amortized update time $O(d \log^3 k)$ and $O(\log k)$ recourse, making it suitable for large-scale and evolving datasets.

Supplementary Material: zip

Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)

Submission Number: 22713

Loading