Fair-Center Clustering on Massive Social Network Data Streams

Longkun Guo, Chaoqi Jia, Chao Chen

Published: 2026, Last Modified: 08 May 2026WWW 2026EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: As a fundamental technique with many real-world applications, including social network analysis, center-based clustering may inadvertently discriminate against certain populations based on factors such as age, gender, or socioeconomic status, particularly when nodes are associated with sensitive attributes. In this work, we study the problem of fair k-center clustering in the streaming setting, which seeks to select representative items from a large data stream while respecting group-representation fairness. Given an input dataset in Euclidean space partitioned into m disjoint groups, the fairness constraint requires that the number of centers selected from each group satisfies a given upper bound. Moreover, the problem aims to select a set of centers that minimizes the maximum distance from any point to its nearest center (the k-center objective) while satisfying the fairness constraint. We present a one-pass streaming algorithm with approximation ratio 4.46, improving the previous best ratio of (5+?) for this problem in general metrics. Notably, our result establishes that streaming fair k-center admits a strictly better approximation ratio in Euclidean space than in general metrics, in contrast to the standard k-center problem, whose best-known approximation ratio is 2 in both Euclidean and general metric spaces. Finally, we complement our theoretical results with an empirical evaluation on five real-world social network datasets and million-scale synthetic datasets, demonstrating significant improvements over state-of-the-art methods in clustering quality while maintaining comparable runtime efficiency.

External IDs:dblp:conf/www/GuoJC26