Streaming Algorithms for Diversity Maximization with Fairness ConstraintsDownload PDFOpen Website

2022 (modified: 20 Dec 2022)ICDE 2022Readers: Everyone
Abstract: Diversity maximization is a fundamental problem with wide applications in data summarization, web search, and recommender systems. Given a set <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$X$</tex> of <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$n$</tex> elements, it asks to select a subset <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$S$</tex> of <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$k\ll n$</tex> elements with maximum diversity, as quantified by the dissimilarities among the elements in S. In this paper, we focus on the diversity maximization problem with fairness constraints in the streaming setting. Specifically, we consider the max-min diversity objective, which selects a subset <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$S$</tex> that maximizes the minimum distance (dissimilarity) between any pair of distinct elements within it. Assuming that the set <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$X$</tex> is partitioned into <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$m$</tex> disjoint groups by some sensitive attribute, e.g., sex or race, ensuring fairness requires that the selected subset <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$S$</tex> contains k <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i</inf> elements from each group i є [1, m]. A streaming algorithm should process <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$X$</tex> sequentially in one pass and return a subset with maximum diversity while guaranteeing the fairness constraint. Although diversity maximization has been extensively studied, the only known algorithms that can work with the max-min diversity objective and fairness constraints are very inefficient for data streams. Since diversity maximization is NP-hard in general, we propose two approximation algorithms for fair diversity maximization in data streams, the first of which is <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\frac{1-\varepsilon}{4}$</tex> -approximate and specific for m = 2, where є E (0,1), and the second of which achieves a <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\frac{1-\varepsilon}{3m+2}$</tex> -approximation for an arbitrary <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$m$</tex> . Experimental results on real-world and synthetic datasets show that both algorithms provide solutions of comparable quality to the state-of-the-art algorithms while running several orders of magnitude faster in the streaming setting.
0 Replies

Loading