On Hamming–Lipschitz Type Stability of the Subdominant (Minmax) Ultrametric: Theory and Simple Proofs

TMLR Paper7150 Authors

24 Jan 2026 (modified: 20 Apr 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: The subdominant (minmax) ultrametric is a canonical tree-structured summary of a dissimilarity matrix, arising equivalently as the ultrametric induced by single-linkage clustering. While its classical stability theory is usually formulated in $\ell_\infty$ or Gromov--Hausdorff terms, such bounds are poorly suited to sparse perturbations that alter only a few pairwise distances. We develop an $\ell_0$-type stability theory for this operator. Our analysis shows that sparse edits propagate only through the minimum spanning tree: a pairwise ultrametric value can change only if its tree path crosses an edited edge or a cut newly exposed by an edited off-tree edge. This yields a sharp per-edit exposed-cut score and a tree-only global envelope, leading to Hamming--Lipschitz bounds on the number of ultrametric entries that can change. We also prove sharpness results showing that this dependence on tree geometry is unavoidable: under strict cut separation the tree-edge bound is attained exactly, and for off-tree edits there are explicit families in which one edited distance changes $\Theta(n^2)$ ultrametric entries. In addition, we prove a conditional near-additivity principle for multiple edits under certified large per-edit changed regions and negligible aggregate overlap. Experiments on deep-embedding graphs show that the resulting structural scores provide useful vulnerability diagnostics for hierarchical representations.
Submission Type: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=NJS7QqvwKI&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DTMLR%2FAuthors%23your-submissions)
Changes Since Last Submission: We have significantly revised the manuscript to place practical machine learning applications front and center, explicitly bridging the gap between our theoretical bounds and downstream ML pipelines. The key additions and structural changes include: $\textbf{New Addition}$: Budgeted Active Verification $\textbf{(Appendix D)}:$ We introduced a novel semi-supervised clustering task evaluated on datasets including MNIST, USPS, HAR, Olivetti, and OptDigits. This experiment demonstrates that $S_{\mathrm{union}}(e)$ serves as a computationally cheap and highly effective priority rule for human verification. $\textbf{ML Use Case:}$ In scenarios where only a small number of cluster connections can be manually checked, our score efficiently dictates which edges to verify first to maximally improve the global cluster structure. $\textbf{Substantiating Asymptotic Conditions (Corollary 1):}$ To demonstrate that the hypotheses in Corollary 1(ii) are non-vacuous, we introduced a "star of subtrees" toy example. This construction formally shows how simultaneous sparse edits ($k$ edits among $m$ branches, where $k = o(m)$) yield certified changed regions with asymptotically negligible overlap, driving the overlap ratio to 0. Detailed study is given in $\textbf{Appendix E}$ of the revised draft.
Assigned Action Editor: ~Akshay_Rangamani1
Submission Number: 7150
Loading