Fair Clustering via Hierarchical Fair-Dirichlet Prior

Published: 03 Feb 2026, Last Modified: 03 Feb 2026AISTATS 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We present a novel model-based formulation of fair clustering: rigorously define a notion of fair clustering in the population level and develop a Bayesian methodology that targets the population level objective.
Abstract: The advent of ML-driven decision-making has led to an increasing focus on algorithmic fairness. The widespread utility of clustering has naturally prompted proliferation of literature on fair clustering. A popular notion of fairness in clustering mandates the clusters to be balanced, i.e., each level of a protected attribute must be approximately equally represented in each cluster. In this article, we offer a novel model-based formulation of fair clustering, complementing the existing literature which is almost exclusively based on optimizing appropriate objective functions. We first rigorously define a notion of fair clustering in the population level and develop a Bayesian methodology equipped with a novel hierarchical prior specification that targets the population level objective by enforcing the notion of balance in the resulting clusters. In addition, we devise a scheme for principled performance evaluation of competing algorithms leveraging on a concrete notion of optimal recovery. An efficient collapsed Gibbs sampler is developed to sample from the posterior by integrating a novel scheme for non-uniform sampling from the space of binary matrices with fixed margin with a proposal guided by optimal transport. Superior empirical performance of the proposed methodology, compared to the state-of-the-art, is demonstrated across numerical experiments, benchmark data-sets, and gender-neutral fair clustering in distress analysis interview corpus.
Submission Number: 684
Loading