Title: Optimal Explainable k-Medians via Random Coordinate Cuts

Abstract: Explainable k-medians clustering is a fundamental problem in unsupervised learning with increasing importance in critical applications. The RANDOMCOORDINATECUT algorithm, a simple polynomial-time randomized approach, has been independently proposed by several groups for this problem. While previous analyses established an O(log k log log k) competitive ratio, its optimality remained a conjecture. In this paper, we provide a tight analysis of the RANDOMCOORDINATECUT algorithm, proving that its competitive ratio is optimally bounded by 2 ln k + 2. This bound precisely matches the Ω(log k) lower bound established by Dasgupta et al. (2020), resolving an open question and significantly improving upon prior analyses, including a slightly worse O(log²k) bound by Gamlath, Jia, Polak, and Svensson (2021). Our work demonstrates that random coordinate cuts achieve the best possible competitive ratio for explainable k-medians in ℓ₁ norm.

Section: Introduction
The increasing deployment of machine learning in critical domains like healthcare, finance, and public policy necessitates algorithmic transparency and interpretability. As machine learning models influence significant decisions, understanding their underlying logic is paramount. This paper addresses the challenge of explainable clustering, a crucial step towards transparent unsupervised learning. Specifically, we focus on explainable k-medians clustering, aiming to produce data partitions that are easily understood and visualized by humans.

Traditional clustering algorithms, such as k-means, k-medians, and k-medoids, are fundamental centroid-based methods. They partition data into Voronoi cells based on proximity to k centers. However, these Voronoi cells often exhibit complex boundaries (as illustrated in Figure 1), making the resulting clusters difficult for human users to comprehend and interpret.

To address this problem, Dasgupta, Frost, Moshkovitz, and Rashtchian [2020] introduced the concept of explainable k-means and k-medians clustering, proposing threshold decision trees as an intuitive and interpretable means to define clusters. A threshold decision tree is a binary space partitioning tree with k leaves. Each internal node performs a threshold cut (j, θ), splitting data points based on whether x j ≤ θ or x j > θ. This process recursively partitions the d-dimensional space into k rectangular regions, each corresponding to a cluster P i (see Figure 1 for an illustration). The cost of such a threshold decision tree T is measured using standard k-medians objectives: cost(X, T) = k i=1 x∈Pi ∥x -ĉi ∥ 1 , where ĉ1 , . . . , ĉk are the medians of clusters P 1 , . . . , P k . We denote the ℓ 1 -norm by ∥ • ∥ 1 . Unlike unconstrained k-medians, points are not necessarily assigned to their globally closest center, but rather to the center of their assigned rectangular region.

The 'price of explainability' quantifies the trade-off between interpretability and clustering quality, defined as the ratio of the explainable k-medians cost to the optimal unconstrained k-medians cost. Dasgupta et al. [2020] demonstrated that this cost depends only on k, not on the data set size. They introduced a greedy algorithm that achieves an O(k)-competitive ratio for explainable k-medians, given k reference centers. The standard approach involves first obtaining these reference centers via an off-the-shelf approximation algorithm for k-medians, followed by an α-competitive algorithm for explainable k-medians. They also established an Ω(log k) lower bound on the price of explainability for both k-medians and k-means.

The problem of explainable clustering has garnered significant research attention. Several algorithms, including those by Makarychev and Shan [2021] and Esfandiari et al. [2022], have been proposed, achieving near-optimal competitive ratios of Õ(log k) for k-medians and Õ(k) for k-means. A particularly influential and simple algorithm, independently discovered by multiple groups, is the RANDOMCOORDINATECUT algorithm.

The RANDOMCOORDINATECUT algorithm constructs a threshold decision tree from a given set of k reference centers c 1 , . . . , c k . It operates by recursively partitioning the d-dimensional space. Starting with all centers at the root, at each step, a random coordinate j and threshold θ are chosen. This cut is applied to every current cell, splitting the centers within it. If a cut successfully separates centers in a cell, that cell is divided into two sub-regions. This process continues until each leaf node of the tree contains exactly one reference center. The pseudo-code for this algorithm is presented in Figure 2. Prior analyses by Makarychev and Shan [2021] and Esfandiari et al. [2022] established an O(log k log log k) competitive ratio for RANDOMCOORDINATECUT. Despite this, it was conjectured that the algorithm is optimal, achieving an O(log k) competitive ratio, more specifically, H k-1 + 1, where H k is the k-th harmonic number.

Our Results. In this paper, we present a tight analysis of the RANDOMCOORDINATECUT algorithm for explainable k-medians. We prove that its competitive ratio is at most 2 ln k + 2. This bound is optimal, matching the Ω(log k) lower bound established by Dasgupta, Frost, Moshkovitz, and Rashtchian [2020], thereby resolving a long-standing conjecture regarding the algorithm's performance. Our analysis is not only tight but also notably simple, leveraging a novel framework centered around the 'Set Elimination Game,' a concept implicitly present in prior work, which we formalize and analyze to derive our main result.

Concurrent Work. Independently and concurrently with our work, Gupta, Pittu, Svensson, and Yuan [2023] also proved an O(log k) bound on the price of explainability for k-medians. They showed that the competitive ratio of RANDOMCOORDINATECUT is 1 + H k-1 , where H k is the k-th harmonic number. Their work similarly answers the open question raised by Gamlath, Jia, Polak, and Svensson [2021] and further improved competitive ratios for explainable k-means from O(k log k) to O(k log log k).

Section: Set Elimination Game
In this section, we define the set elimination game. Consider a discrete finite measure space (Ω, µ). In this space, each element ω ∈ Ω has a measure of µ(ω), and the measure of every set S ⊆ Ω equals µ(S) = ω∈S µ(ω). Let S 1 , S 2 , . . . , S k ⊂ Ω be k distinct sets which may overlap with each other. The set elimination game proceeds in a series of rounds. Initially, all sets S 1 , . . . , S k enter the competition. Formally, they belong to the set of remaining sets R 0 = {S 1 , . . . , S k }. At every round n, the host picks a random ω n ∈ Ω with probability Pr(ω n = ω) = µ(ω)/µ(Ω). Then, all sets S i that contain ω n are eliminated from the game unless all remaining sets contain ω n , in which case, no set gets eliminated. That is, for n ≥ 1,
R n = R n-1 \ {S i ∈ R n-1 : ω n ∈ S i }, if for some S i ∈ R n-1 , ω n / ∈ S i ; R n-1 ,
otherwise.
(1)
The last remaining set is declared the winner. We denote that winner by winner. We say that the cost of the game is the measure of the winning set, µ(winner).
We remark that R n cannot get empty (in which case, the winner would not be defined) because of the "otherwise" clause in the definition (1). We shall always assume that all sets S 1 , . . . , S k are not only distinct and non-empty but also (a) for every i, µ(S i ) > 0, and (b) for all i and j, µ(S i △S j ) > 0 (here, S i △S j denotes the symmetric difference of sets S i and S j ). Then, in every game, there is a unique winner with probability 1.
We similarly define the set elimination game for arbitrary finite measure spaces: For an arbitrary finite measure space (Ω, µ), element ω n is chosen with probability function Pr(ω n ∈ S) = µ(S)/µ(Ω).
Our main result is the following theorem, which, as we discuss later in Section 2.1, implies that the competitive ratio of the explainable clustering algorithm is 2 ln k + 2. Theorem 2.1. Consider a set elimination game with the finite measure space (Ω, µ) and k distinct sets S 1 , S 2 , . . . , S k (as above). The expected cost of the game is at most
E µ(winner) ≤ (2 ln k + 2) • min i∈[k]
µ(S i ).
To simplify the exposition, we will prove this theorem for discrete finite measure sets. If Ω is not a discrete measure space, we first replace it with a quotient space: We say that ω ′ ∈ Ω and ω ′′ ∈ Ω are equivalent (ω ′ ∼ ω ′′ ) if they are contained in exactly the same set of sets S 1 , . . . , S k . This equivalence relation partitions Ω into at most 2 k different equivalence classes. We replace Ω with the quotient space Ω /∼ whose elements are equivalence classes. In other words, we merge all equivalent ω's. The measure of a new element ω equals to the measure of the corresponding equivalence class.
Organization. In Section 2.1, we discuss the connection between explainable k-medians and set elimination games. We define a set elimination game in a set system I ⊂ {S 1 , . . . , S k } in Section 2.2. Then, we define the hitting and elimination time in Section 2.3. In Section 3, we first illustrate our proof strategy by showing Theorem 2.1 for the case when the smallest set S 1 does not overlap with S 2 , . . . , S k . An important ingredient of our proof is the notion of surprise sets, which we discuss in Section 3.1. Finally, we complete the proof of Theorem 2.1 in Section 3.2.

Section: Explainable k-Medians via Set Elimination Game
In this section, we show how to use Theorem 2.1 to obtain a bound of 2 ln k + 2 on the competitive ratio of the RANDOMCOORDINATECUT algorithm.
Theorem 2.2. The competitive ratio of the RANDOMCOORDINATECUT algorithm for Explainable k-Medians is at most 2 ln k + 2. That is, for every set of centers C = {c 1 , . . . , c k } and data set X, the algorithm finds a random decision tree T such that
E[cost(X, T )] ≤ (2 ln k + 2) • x∈X min c∈{c 1 ,...,c k } ∥x -c∥ 1 .
The pseudo-code for the RANDOMCOORDINATECUT algorithm is provided in Figure 2.
Theorem 2.2 shows that given any k centers C = {c 1 , . . . , c k }, RANDOMCOORDINATECUT finds a decision tree T with cost at most 2 ln k + 2 times the cost of unconstrained k-medians with centers C = {c 1 , . . . , c k }. By using k centers given by any constant approximation algorithm for k-medians, RANDOMCOORDINATECUT finds a decision tree with cost at most O(log k) times the optimal unconstrained k-medians cost. This implies an O(log k) upper bound on the price of explainability.
Proof of Theorem 2.2. Consider an arbitrary data set X ⊂ R d and set of k centers C ⊂ R d . We assume that all points in X and all centers in C are in the cube [-M, M ] d . The threshold decision tree obtained by the RANDOMCOORDINATECUT algorithm partitions the space into k cells. Each cell contains a single reference center c i . The center c i is not necessarily optimal for cluster P i (cluster P i is the intersection of the data set X and i-th cell). However, we will use it as a proxy for the optimal center. In other words, we will upper bound the cost of the threshold decision tree as follows:
cost(X, T ) ≡ min ĉ1 ,...,ĉ k k i=1 x∈Pi ∥x -ĉi ∥ 1 ≤ k i=1 x∈Pi ∥x -c i ∥ 1 .
Let Ω be the set of all coordinate cuts:
Ω = {(j, θ) : j ∈ [d], θ ∈ [-M, M ]}.
We define a measure µ on Ω as follows. For every subset S ⊂ Ω, we set
µ(S) = d j=1 µ L ({θ : (j, θ) ∈ S}),
where µ L is the Lebesgue measure on R. Thus, we have µ(Ω) = 2dM , which implies (Ω, µ) is a finite measure space.
Consider any data point x ∈ X. Define k sets S 1 , S 2 , . . . , S k for the set elimination game. For every i ∈ {1, . . . , k}, let S i be the set of all threshold cuts that separate x and center c i , i.e.,
S i = {(j, θ) ∈ Ω : sign(x j -θ) ̸ = sign(c i j -θ)}.
Note that the ℓ 1 distance from x to center c i equals the measure of S i : ∥x -c i ∥ 1 = µ(S i ). We now examine the set elimination game with sets S 1 , . . . , S k , measure space (Ω, µ), and random sequence of draws ω 1 , ω 2 , . . . (each ω n ∈ Ω is the threshold cut chosen by the RANDOMCOORDINATECUT algorithm at step n). We claim that S i belongs to R n if and only if center c i lies in the same cell as point x after step n of the algorithm. This is the case for n = 0, since R 0 contains all sets S 1 , . . . , S k and the root of the threshold tree contains all centers c 1 , . . . , c k . Then, whenever we pick cut ω n , all centers separated from x by ω n are removed from the cell of x. The only exception from this rule occurs when all centers in that cell lie on the same side of the cut ω n . That is exactly the same rule as we have for the set elimination game (note that center c i is separated from x by ω n if and only if ω n ∈ S i ). Therefore, the same sets S i remain in the game as center c i in the cell of x (namely, sets S i and centers c i have the same indices).
The RANDOMCOORDINATECUT algorithm stops when all leaves of the decision tree contain exactly one center. At this step, the set elimination game contains one set, S i . This set corresponds to the center c i assigned to point x. The cost of the game µ(S i ) equals the distance from x to c i . By Theorem 2.1, we have
E[cost(x, T )] = E[µ(winner)] ≤ (2 ln k + 2) • min i µ(S i ) = (2 ln k + 2) • min i ∥x -c i ∥ 1 .
We sum this bound over all data points x in X and get the desired result.

Section: Local Competitions
We now revisit the definition of the set elimination game and define competitions in subsets of {S 1 , . . . , S k }. For the rest of the proof, we assume (Ω, µ) is a discrete finite measure space. We remind the reader that every set elimination game is determined by an infinite sequence of i.i.d. random variables ω 1 , ω 2 , . . . . In each round n, we sample an element ω n from Ω with probability Pr(ω n = ω) = µ(ω)/µ(Ω). Definition 2.3. Consider a finite measure space (Ω, µ). Let I be a set of subsets of Ω. We say that I is a valid set system if (a) for every S ∈ I, µ(S) > 0, and (b) for every S ′ , S ′′ ∈ I, µ(S ′ △S ′′ ) > 0.
The reader may assume that µ(ω) > 0 for all ω in Ω. Then, the definition above says that in a valid set system I, all sets are non-empty and distinct. Definition 2.4. Consider a finite measure space (Ω, µ). Let ω 1 , ω 2 , . . . be i.i.d. random variables as described above and I be a valid set system. We define a set elimination game in I. Initially, R 0 (I) = I. Then, for every n ≥ 1,
R n (I) = R n-1 (I) \ {S ∈ R n-1 (I) : ω n ∈ S}, if for some S ′ ∈ R n-1 (I), ω n / ∈ S ′ ; R n-1 (I),
otherwise.
(2)
The winner of the game in I, denoted by winner(I), is the only element remaining, or, formally, the unique element in ∩ n≥0 R n (I). If ∩ n≥0 R n (I) contains more than one element, then the winner is not defined. The cost of the game is the measure of the winner, µ(winner(I)).
We remark that ∩ n≥0 R n (I) contains exactly one element with probability 1. Thus, the winner and cost of the game are defined with probability 1.
Consider sets S 1 , . . . , S k from Theorem 2.1. Denote K = {S 1 , . . . , S k }. The definition of the competition among sets S 1 , . . . , S k (given in the beginning of Section 2) is exactly the same as the definition of competition in K. Our goal is to show that E[µ(winner(K))] ≤ 2(ln k + 1) • min Si∈K µ(S i ). In the proof of Theorem 2.1, we will consider competitions in different set systems I ⊆ K. We show the following key lemma. We defer the proof of Lemma 2.5 to Appendix A. Lemma 2.5. Consider a partitioning of the set system K = {S 1 , . . . , S k } into m sets I 1 , . . . , I m . Then, winner(K) ∈ winner(I 1 ), . . . , winner(I m ) .

Section: Set Elimination with Exponential Clock
Consider a set elimination game on sets S 1 , . . . , S k . It is determined by the sequence of random i.i.d. draws ω 1 , ω 2 , . . . . Random variable ω n is chosen in round n. We assign every round a random time τ n . Let the time between two consecutive rounds be an exponential random variable with parameter µ(Ω). Specifically, let ∆τ 1 , ∆τ 2 , . . . be a sequence of i.i.d. exponential random variables with parameter µ(Ω) and each
τ n = τ n-1 + ∆τ n = ∆τ 1 + • • • + ∆τ n .
Note that all ∆τ n are positive and τ 1 , τ 2 , . . . is an increasing sequence with probability 1. The number of draws that occur by time t (i.e., N t (Ω) = |{n : τ n ≤ t}|) is a Poisson process with parameter µ(Ω). We now can think of the set elimination game as follows: The host of the game observes a Poisson process with parameter µ(Ω). Whenever the process jumps (at time τ n ), the host picks an element ω n in Ω with probability Pr(ω n = ω) = µ(ω)/µ(Ω) and eliminates some sets according to the rules of the game discussed above. Note that by assigning every round some time τ n , we do not change the game, the winner, and the cost of the game (because the sequence of random draws ω 1 , ω 2 , . . . remains the same as before). This interpretation of the game allows us to introduce a hitting time h(S) of every subset S ⊂ Ω with the following properties: (a) each h(S) is an exponential random variable with rate µ(S); (b) hitting times of disjoint sets are mutually independent random variables. Definition 2.6. For every subset X ⊂ Ω, the hitting time h(X) is the time τ n when the first ω n is drawn from X: h(X) = min{τ n : ω n ∈ X}. When the set contains one element ω, we will write h(ω) instead of h({ω}).
We also define the elimination time of each set S i . Definition 2.7. Consider any set elimination game with the measure space (Ω, µ) and k sets S 1 , S 2 , . . . , S k in Ω. The elimination time e(S i ) of set S i is the time when set S i is eliminated from the game, i.e., e(S i ) = min{τ n : S i / ∈ R n (K)}. If S i is the winner, then we let e(S i ) = ∞ (because the winner is never eliminated).
Let us examine bound (3). Let Surprise be the set of all surprise sets. Note that Surprise is a random set. Then,
k i=2 Pr S i = winner(K) µ(S i ) ≤ k i=2 Pr S i = winner(K), S i / ∈ Surprise • µ(S i ) (5) + k i=2 Pr S i ∈ Surprise • µ(S i ).
We show in the next section (Lemma 3.3) that the second sum is upper bounded by µ(S 1 ). We now bound the first sum. For every winner S i which is not a surprise set, we have e(S i ) ≥ h(S 1 ) (because S i is the winner) and h(S 1 ) ≤ L/µ(S i ) (because S i is not a surprise set). We also have S i = winner(I -), thus
Pr S i = winner(K), S i / ∈ Surprise ≤ Pr h(S 1 ) ≤ L/µ(S i ) and S i = winner(I -) .
By Lemma 2.9, all hitting times h(S i ) = min ω∈Si h(ω) for i ≥ 2 are independent from h(S 1 ). Thus, winner(I -) is also independent of h(S 1 ) (winner(I -) depends only on the hitting times for sets S i ∈ I -). Therefore, Si)   ≤Lµ(S1)/µ(Si)
Pr S i = winner(K), S i / ∈ Surprise ≤ Pr h(S 1 ) ≤ L/µ(S i ) • Pr S i = winner(I -) = 1 -e -Lµ(S1)/µ(
• Pr S i = winner(I -)
≤ Pr S i = winner(I -) • L • µ(S 1 )/µ(S i ).
We combine all bounds on terms of ( 5) and get the following bound on the expected cost of the game:
µ(S 1 ) + k i=2 Pr S i = winner(I -) • L • µ(S 1 ) + µ(S 1 ) = (L + 2) • µ(S 1 ) = (ln k + 2) • µ(S 1 ).
This concludes the proof of the theorem for the case when S 1 does not overlap with S 2 , . . . , S k . We now analyze surprise sets.

Section: Surprise Sets
In this section, we prove a bound on the probability that a set S i is a surprise set. We no longer assume that S 1 does not intersect with other sets S i . We first show a lemma about exponential random variables. Lemma 3.2. Let X and Y be two independent exponential random variables with positive parameters λ X and λ Y , respectively. Then, for every T ≥ 0, we have
Pr Y ≥ X ≥ T = λ X λ X + λ Y • e -(λ X +λ Y )T .(6)
Proof. The desired probability can be easily found by computing
∞ T (F X (t) -F X (T ))f Y (t)dt, where F X (t) = 1 -e -λ X t is the cumulative distribution function of X, and f Y (t) = λ Y • e -λ Y
t is the probability density function of Y . Here, we give an alternative proof. Write,
Pr Y ≥ X ≥ T = Pr Y ≥ X & min(X, Y ) ≥ T = Pr X ≤ Y | min(X, Y ) ≥ T ) • Pr min(X, Y ) ≥ T .
We have Pr min(X, Y ) ≥ T = e -(λ X +λ Y )T , because the minimum of two independent exponential random variables with parameters λ X and λ Y is an exponential random variable with parameter λ
X +λ Y . Then, Pr X ≤ Y | min(X, Y ) ≥ T ) = Pr X ≤ Y ) because the exponential distribution is memoryless; and Pr X ≤ Y ) = λ X /(λ X + λ Y ).
Lemma 3.3. For every set S i , we have
Pr(S i is surprise set) ≤ 1 k • µ(S 1 ) µ(S i ) .
Proof. First, we show that min(e(S i ), h(S 1 )) ≤ h(S i \ S 1 ).
Claim 3.4. We always have min(e(S i ), h(S 1 )) ≤ h(S i \ S 1 ).
Proof. Consider an arbitrary realization of the game and the time t = h(S i \ S 1 ) when S i \ S 1 is hit. If by this time, S 1 has already been hit then h(S 1 ) < t. Similarly, if by this time, S i has already been eliminated then e(S i ) < t. Otherwise, both S 1 and S i are still remaining in the game at time t. Therefore, when we pick ω ∈ S i \ S 1 at time t, set S i gets eliminated (since ω ∈ S i ; ω / ∈ S 1 ; both S 1 and S i are remaining in the game). Thus, in this case, e(S i ) = t. This concludes the proof.
If S i is a surprise set, then min(e(S i ), h(S 1 )) = h(S 1 ) ≥ L/µ(S i ). By Claim 3.4, we have
h(S i \ S 1 ) ≥ min e(S i ), h(S 1 ) = h(S 1 ) ≥ L/µ(S i ).
Thus, Pr(S i is surprise set) ≤ Pr i \ S 1 ) ≥ h(S 1 ) ≥ L/µ(S i ) . By Lemma 3.2 applied to the independent exponential random variables h(S 1 ), h(S i \ S 1 ), and time T = L/µ(S i ), we have
Pr(S i is surprise set) ≤ µ(S 1 ) µ(S i \ S 1 ) + µ(S 1 ) • e - L(µ(S i \S 1 )+µ(S 1 )) µ(S i ) ≤ 1 k • µ(S 1 ) µ(S i ) .

Section: General Case
Proof of Theorem 2.1. We upper bound the expected cost of the game for arbitrary sets S 1 , . . . , S k .
As before, we assume that S 1 is the smallest set. We remind the reader that each hitting time h(S i ) is an exponential random variable with parameter µ(S i ). In the proof, we will use the definitions of surprise sets (see Definitions 3.1). We also set L = ln k. We define all sets S i for i ̸ = 1 that are not a surprise set to be non-surprise sets.
We separately upper bound the cost of the winner depending on whether the winner is (a) set S 1 , (b) surprise set, (c) non-surprise set. Write
E µ(winner(K)) = E µ(winner(K)) • 1{winner(K) = S 1 } (a) + E µ(winner(K)) • 1{winner is surprise set} (b) + E µ(winner(K)) • 1{winner is non-surprise set} . (c)
Term (a) is upper bounded by µ(S 1 ). We bound term (b) using Lemma 3.3: The probability that a set is a surprise set is at most 1 /k • µ(S 1 )/µ(S i ). Thus, the expected total measure of all sets (not only the surprise winner) is upper bounded by 1 k k i=2 µ(S1) µ(Si) µ(S i ) < µ(S 1 ). We now bound term (c). Define a new random variable: Let cost(ω) be the cost of the winner (i.e., µ(S i ), where S i is the winner) if (1) the winner is a non-surprise set, and (2) ω is the first element that was chosen in S 1 . We let cost(ω) = 0, otherwise. If ω is the first element that was chosen in S 1 , then h(S 1 ) = h(ω). So, the definition of cost(ω) can be written as follows:
cost(ω) = µ(winner(K)) • 1{h(S 1 ) = h(ω)} • 1{winner(K) ̸ ∈ Surprise}.
Since the hitting time h(S 1 ) is finite with probability 1, the term (c) equals  If S i is a non-surprise set, then h(S 1 ) < L/µ(S i ) or e(S i ) < h(S 1 ). If S i is the winner, then e(S i ) ≥ h(S 1 ). Thus, if S i is a non-surprise winner, then h(S 1 ) < L/µ(S i ). This observations gives us the following upper bound on ( 7):
E cost(ω) ≤ k i=2
µ(S i ) • Pr S i = winner(K) and h(ω) = h(S 1 ) < L/µ(S i ) .
Define two set systems I - ω and I + ω of sets S i containing and not containing ω: I - ω = {S i : ω / ∈ S i and i ≥ 2};
I + ω = {S i : ω ∈ S i and i ≥ 2}. Note that K ≡ {S 1 , . . . , S k } = {S 1 } ∪ I - ω ∪ I + ω . By Lemma 2.5, winner(K) ∈ S 1 , winner(I - ω ), winner(I + ω ) . Observe that if S i with i ≥ 2 is the winner, then S i = winner(I - ω ) or S i = winner(I + ω ). We replace the condition S i = winner(K) with S i ∈ {winner(I - ω ), winner(I + ω )} in ( 8) and get bound:
E cost(ω) ≤ k i=2
µ(S i ) • Pr S i ∈ {winner(I - ω ), winner(I + ω )} and h(ω) < L µ(S i ) .
The key observation now is that sets winner(I - ω ) and winner(I + ω ) are independent of h(ω). This is the case, because sets remaining in the competitions R n (I - ω ) and R n (I + ω ) do not change when we select ω. .
Using that h(ω) is an exponential random variable with parameter µ(ω), we get (for every i)
µ(S i ) • Pr h(ω) ≤ L µ(S i ) = µ(S i ) • 1 -e -L µ(ω) µ(S i ) ≤ µ(S i ) • L µ(ω) µ(S i ) = µ(ω)L.
Hence,
E cost(ω) ≤ µ(ω)L • k i=2
Pr S i ∈ {winner(I - ω ), winner(I + ω )} .
The sum on the right hand side is at most 2. Thus, E[cost(ω)] ≤ 2Lµ(ω).

Section: Acknowledgments and Disclosure of Funding
The authors are supported by NSF Awards CCF-1955351, CCF-1934931, EECS-29 2216970.   

Section: 
Note that e(S i ) ≥ h(S i ). Sometimes, e(S i ) may be equal to h(S i ), but e(S i ) and h(S i ) are not always the same. We now prove that hitting times for disjoint sets are independent. To this end, we split the Poisson process N t (Ω) = |{n : τ n ≤ t}|. Let N t (ω) = |{n : τ n ≤ t and ω n = ω}|. It is easy to see that N t (Ω) = ω∈Ω N t (ω) for every t. It is also true that each N t (ω) is a Poisson process with parameter µ(ω) and all N t (ω) (for ω ∈ Ω) are mutually independent. This fact follows from the Coloring Theorem (see e.g., Kingman [1992], Coloring Theorem, page 53). Theorem 2.8 (Coloring Theorem). Let Π t be a Poisson process on the real line with rate λ. We color each event of the Poisson process randomly with one of M colors: The probability that a point receives the i-th color is p i . The colors of different points are independent. Let Π t (i) be the number of events of color i in the interval (0, t]. Then, Π t (1), . . . , Π t (M ) are independent Poisson processes. The rate of process Π t (i) is λp i .
Lemma 2.9. For every ω ∈ Ω, h(ω) is an exponential random variable with parameter µ(ω), and all random variables h(ω) (for ω ∈ Ω) are mutually independent.
Proof. Observe that h(ω) = min{t : N t (ω) ≥ 1}. Thus, h(ω) is an exponential random variable (the time of the first jump of a Poisson process) with rate µ(ω). Also, since all N t (ω) (for ω ∈ Ω) are mutually independent, all h(ω) are also mutually independent.
Note that the set elimination game depends only on the hitting times for elements ω in Ω. This is the case because it matters only when every ω is drawn the first time. At that time -the hitting time of ω -all sets that contain ω are eliminated unless all remaining sets contain this ω. When the same ω is drawn again, it does not eliminate any new sets. Also, note that for any set S ⊂ Ω, the hitting time h(S) = min ω∈S h(ω). Thus, h(S) is an exponential random variable with parameter µ(S) = ω∈S µ(ω).

Section: Proof of Main Result
We now present the proof of our main result, Theorem 2.1. We assume without loss of generality that S 1 is the smallest set i.e., µ(S 1 ) ≤ µ(S i ) for all i. Then, the expected cost of the game is at most:
Pr S i = winner(K) µ(S i ).
(3)
We first provide some intuition for the proof by considering the case when S 1 does not intersect with sets S 2 , . . . , S k , i.e. sets S 1 and S i are disjoint for all i = 2, 3, . . . , k. We split all sets into two groups S 1 and the rest of the sets S 2 , . . . , S k . We know from Lemma 2.5 that the winner among all sets S 1 , . . . , S k is either S 1 or winner {S 2 , . . . , S k } . Denote I -= {S 2 , . . . , S k }. Each set S i is eliminated at time e(S i ). The set S 1 is eliminated at its hitting time h(S 1 ) unless it is the only remaining set at time h(S 1 ) (because we are considering the case when S 1 does not overlap with other sets). Thus,
) > e(winner(I -)); winner(I -), if e(winner(I -)) > h(S 1 ).
(4)
When the winner among S 1 , . . . , S k is not S 1 , we consider two cases of the winner S i : (1) S i is a surprise set; (2) S i is a non-surprise set. Definition 3.1. We say that S i is a surprise set if e(S i ) ≥ h(S 1 ) ≥ L/µ(S i ), where L = ln k.
We call S i a surprise set because the probability of the event e(S i ) ≥ h(S 1 ) ≥ L/µ(S i ) is small. We give a bound on the probability of e(S i ) ≥ h(S 1 ) ≥ L/µ(S i ) in Lemma 3.3. Here, we provide some intuition. By Lemma 2.9, the hitting time h(S i ) is an exponential random variable with parameter µ(S i ). Thus, the expected hitting time for S i is 1/µ(S i ). Consider a set S i with a small measure (µ(S i ) is close to µ(S 1 )). If the hitting time h(S 1 ) ≥ L/µ(S i ), then h(S 1 ) is much larger than its expected value 1/µ(S 1 ), which happens with a small probability. Consider a set S i with a large measure µ(S i ) ≫ µ(S 1 ). Then, the expected hitting time for S i is 1/µ(S i ), which is much smaller than the expected hitting time of S 1 . Thus, the event e(S i ) ≥ h(S 1 ) occurs with a small probability.
A Proof of Lemma 2.5
Lemma 2.5. Consider a partitioning of the set system K = {S 1 , . . . , S k } into m sets I 1 , . . . , I m . Then, winner(K) ∈ winner(I 1 ), . . . , winner(I m ) .
The proof of Lemma 2.5 relies on the following observarion. Lemma A.1. Let X and Y be two subsets of K. If X ⊂ Y , then for every n, we always have
Proof. We prove that (9) holds by induction on n. Initially, when n = 0, we have R 0
. Suppose (9) holds for n, we prove that (9) also holds for
remains empty for all n ′ ≥ n. Therefore, (9) holds for n + 1. So, let us assume that R n (Y ) ∩ X = R n (X). Consider three cases:
• If ω n+1 belongs to all sets in R n (Y ), then it also belongs to all sets in R n (X) = R n (Y )∩X. Thus, in this case, no set is eliminated in
• If ω n+1 belongs to all sets in R n (X), but not all sets in R n (Y ), then, at step n + 1, we remove all sets that contain ω n+1 and, particularly, all sets in R n (X), from R n (Y ).
Consequently, R n+1 (Y ) ∩ X = ∅ .
• If not all sets in R n (X) and not all sets in R n (Y ) contain ω n+1 , then we remove exactly the same sets from both R n (X) and R n (Y ) ∩ X. Namely, we remove sets
We conclude that (9) holds for n ′ = n + 1.
Proof of Lemma 2.5. Consider an arbitrary realization of the game ω 1 , ω 2 , . . . . Let n be the round when all sets but the winner are eliminated from the competition i.e., R n contains only one set, the winner. Since K is the union of I 1 , . . . , I k , the winner must belong to some I j . Now, by Lemma A.1 for X = I j and Y = K, we have R n (K) ∩ I j = R n (I j ) or R n (K) ∩ I j = ∅. We know that R n (K) = {winner(K)} and winner(K) ∈ I j . Thus, R n (K) ∩ I j = {winner(K)} ̸ = ∅, and R n (I j ) = R n (K) ∩ I j = {winner(K)}.
We conclude that at round n, R n (I j ) contains only one set -the winner in K. Consequently, it is also the winner in I j i.e., winner(I j ) = winner(K). This finishes the proof.


References:
[b0] Sayan Bandyapadhyay; Fedor Fomin; Petr A Golovach; William Lochet; Nidhi Purohit; Kirill Simonov (2022). How to find a good explanation for clustering. 
[b1] Jarosław Byrka; Thomas Pensyl; Bartosz Rybicki; Aravind Srinivasan; Khoa Trinh (2017). An improved approximation for k-median and positive correlation in budgeted optimization. ACM Transactions on Algorithms (TALG)
[b2] Moses Charikar; Lunjia Hu (2022). Near-optimal explainable k-means for all dimensions. SIAM
[b3] Moses Charikar; Sudipto Guha; Éva Tardos; David B Shmoys (1999). A constant-factor approximation algorithm for the k-median problem. 
[b4] Vincent Cohen; -Addad ; Euiwoong Lee (2022). Johnson coverage hypothesis: Inapproximability of k-means and k-median in lp-metrics. SIAM
[b5] Vincent Cohen-Addad; Hossein Esfandiari; Vahab Mirrokni; Shyam Narayanan (2022). Improved approximations for euclidean k-means and k-median, via nested quasi-independent sets. 
[b6] Sanjoy Dasgupta; Nave Frost; Michal Moshkovitz; Cyrus Rashtchian (2020). Explainable k-means and k-medians clustering. 
[b7] Hossein Esfandiari; Vahab Mirrokni; Shyam Narayanan (2022). Almost tight approximation algorithms for explainable clustering. SIAM
[b8] Nave Frost; Michal Moshkovitz; Cyrus Rashtchian (2020). Exkmc: Expanding explainable k-means clustering. 
[b9] Buddhima Gamlath; Xinrui Jia; Adam Polak; Ola Svensson (2021). Nearly-tight and oblivious algorithms for explainable clustering. Advances in Neural Information Processing Systems
[b10] Anupam Gupta; Madhusudhan Reddy Pittu; Ola Svensson; Rachel Yuan (2023). The price of explainability for clustering. 
[b11] John Frank; Charles Kingman (1992). Poisson processes. Clarendon Press
[b12] Eduardo Laber; Lucas Murtinho; Felipe Oliveira (2023). Shallow decision trees for explainable k-means clustering. Pattern Recognition
[b13] S Eduardo; Lucas Laber;  Murtinho (2021). On the price of explainability for some clustering problems. PMLR
[b14] Shi Li; Ola Svensson (2013). Approximating k-median via pseudo-approximation. 
[b15] Konstantin Makarychev; Liren Shan (2021). Near-optimal algorithms for explainable k-medians and k-means. PMLR
[b16] Konstantin Makarychev; Liren Shan (2022). Explainable k-means: don't be greedy, plant bigger trees. 
[b17] Nimrod Megiddo; Kenneth J Supowit (1984). On the complexity of some common geometric location problems. SIAM journal on computing

Figures:
Figure fig_0: 1
Type: figure
Caption: Figure 1 :1Figure 1: The unconstrained k-medians clustering and explainable k-medians clustering. The left diagram shows the Voronoi partition of the plane w.r.t. three centers in ℓ 1 distance. The Voronoi cell for each center consists of all points that are closer (in ℓ 1 distance) to this center than to any other center (the boundaries between cells are not straight lines because we use the ℓ 1 distance). The middle diagram shows an explainable partition. The right diagram shows the corresponding decision tree for explainable clustering.
Data: 

Figure fig_1: 
Type: figure
Caption: which we prove below, gives a bound of 2Lµ(S 1 ) on the expression above. Combining upper bounds on terms (a), (b), and (c), we get E µ(winner(K)) ≤ (1 + 2L + 1)µ(S 1 ) = (2 ln k + 2) • µ(S 1 ). Lemma 3.5. For every ω ∈ S 1 , we have E[cost(ω)] ≤ 2Lµ(ω).
Data: 

Figure fig_2: 
Type: figure
Caption: Proof.We haveE[cost(ω)] = E µ(winner(K)) • 1{h(S 1 ) = h(ω)} • 1{winner(K) ̸ ∈ Surprise} .(7)
Data: 

Figure fig_3: 
Type: figure
Caption: The set R n (I - ω ) does not change in the round n when ω is chosen because all sets S i in R n (I - ω ) ⊂ I - ω do not contain ω. The set R n (I + ω ) does not change in this round because all sets S i in R n (I + ω ) ⊂ I + ω contain ω and consequently when ω is chosen, none of these sets is removed from R n (I + ω ) (otherwise, R n (I + ω ) would become empty). Thus,E cost(ω) ≤ k i=2 µ(S i ) • Pr S i ∈ {winner(I - ω ), winner(I + ω )} • Pr h(ω) < L µ(S i )
Data: 


Formulas:
Formula formula_0: c 1 , . . . , c k , E[cost(X, T )] ≤ O(log k log log k) • Input: a data set X ⊂ R d and set of centers C = {c 1 , c 2 , . . . , c k } ⊂ R d Output: a threshold tree T Create tree T 0 containing a root node r. Assign C r = {c 1 , c 2 , • • • , c k } to the root. Let t = 0. Let M = max ij |c i j |.

Formula formula_1: Left = {c ∈ C u : c j ≤ θ} and Right = {c ∈ C u : c j > θ}.

Formula formula_2: R n = R n-1 \ {S i ∈ R n-1 : ω n ∈ S i }, if for some S i ∈ R n-1 , ω n / ∈ S i ; R n-1 ,

Formula formula_3: E µ(winner) ≤ (2 ln k + 2) • min i∈[k]

Formula formula_4: E[cost(X, T )] ≤ (2 ln k + 2) • x∈X min c∈{c 1 ,...,c k } ∥x -c∥ 1 .

Formula formula_5: cost(X, T ) ≡ min ĉ1 ,...,ĉ k k i=1 x∈Pi ∥x -ĉi ∥ 1 ≤ k i=1 x∈Pi ∥x -c i ∥ 1 .

Formula formula_6: Ω = {(j, θ) : j ∈ [d], θ ∈ [-M, M ]}.

Formula formula_7: µ(S) = d j=1 µ L ({θ : (j, θ) ∈ S}),

Formula formula_8: S i = {(j, θ) ∈ Ω : sign(x j -θ) ̸ = sign(c i j -θ)}.

Formula formula_9: E[cost(x, T )] = E[µ(winner)] ≤ (2 ln k + 2) • min i µ(S i ) = (2 ln k + 2) • min i ∥x -c i ∥ 1 .

Formula formula_10: R n (I) = R n-1 (I) \ {S ∈ R n-1 (I) : ω n ∈ S}, if for some S ′ ∈ R n-1 (I), ω n / ∈ S ′ ; R n-1 (I),

Formula formula_11: τ n = τ n-1 + ∆τ n = ∆τ 1 + • • • + ∆τ n .

Formula formula_12: k i=2 Pr S i = winner(K) µ(S i ) ≤ k i=2 Pr S i = winner(K), S i / ∈ Surprise • µ(S i ) (5) + k i=2 Pr S i ∈ Surprise • µ(S i ).

Formula formula_13: Pr S i = winner(K), S i / ∈ Surprise ≤ Pr h(S 1 ) ≤ L/µ(S i ) • Pr S i = winner(I -) = 1 -e -Lµ(S1)/µ(

Formula formula_14: ≤ Pr S i = winner(I -) • L • µ(S 1 )/µ(S i ).

Formula formula_15: µ(S 1 ) + k i=2 Pr S i = winner(I -) • L • µ(S 1 ) + µ(S 1 ) = (L + 2) • µ(S 1 ) = (ln k + 2) • µ(S 1 ).

Formula formula_16: Pr Y ≥ X ≥ T = λ X λ X + λ Y • e -(λ X +λ Y )T .(6)

Formula formula_17: ∞ T (F X (t) -F X (T ))f Y (t)dt, where F X (t) = 1 -e -λ X t is the cumulative distribution function of X, and f Y (t) = λ Y • e -λ Y

Formula formula_18: Pr Y ≥ X ≥ T = Pr Y ≥ X & min(X, Y ) ≥ T = Pr X ≤ Y | min(X, Y ) ≥ T ) • Pr min(X, Y ) ≥ T .

Formula formula_19: X +λ Y . Then, Pr X ≤ Y | min(X, Y ) ≥ T ) = Pr X ≤ Y ) because the exponential distribution is memoryless; and Pr X ≤ Y ) = λ X /(λ X + λ Y ).

Formula formula_20: Pr(S i is surprise set) ≤ 1 k • µ(S 1 ) µ(S i ) .

Formula formula_21: h(S i \ S 1 ) ≥ min e(S i ), h(S 1 ) = h(S 1 ) ≥ L/µ(S i ).

Formula formula_22: Pr(S i is surprise set) ≤ µ(S 1 ) µ(S i \ S 1 ) + µ(S 1 ) • e - L(µ(S i \S 1 )+µ(S 1 )) µ(S i ) ≤ 1 k • µ(S 1 ) µ(S i ) .

Formula formula_23: E µ(winner(K)) = E µ(winner(K)) • 1{winner(K) = S 1 } (a) + E µ(winner(K)) • 1{winner is surprise set} (b) + E µ(winner(K)) • 1{winner is non-surprise set} . (c)

Formula formula_24: cost(ω) = µ(winner(K)) • 1{h(S 1 ) = h(ω)} • 1{winner(K) ̸ ∈ Surprise}.

Formula formula_25: E cost(ω) ≤ k i=2

Formula formula_27: E cost(ω) ≤ k i=2

Formula formula_28: µ(S i ) • Pr h(ω) ≤ L µ(S i ) = µ(S i ) • 1 -e -L µ(ω) µ(S i ) ≤ µ(S i ) • L µ(ω) µ(S i ) = µ(ω)L.

Formula formula_29: E cost(ω) ≤ µ(ω)L • k i=2
