Top-k representative queries with binary constraints

Published: 01 Jan 2015, Last Modified: 25 Feb 2025SSDBM 2015EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Given a collection of binary constraints that categorize whether a data object is relevant or not, we consider the problem of online retrieval of the top-k objects that best represent all other relevant objects in the underlying dataset. Such top-k representative queries naturally arise in a wide range of complex data analytic applications including advertisement, search, and recommendation. In this paper, we aim at identifying the top-k representative objects that are high-scoring, satisfy diverse subsets of given binary constraints, as well as representative of various other relevant objects in the dataset. We formulate our problem with the well-established notion of the top-k representative skylines, and we show that the problem is NP-hard. Hence, we design efficient techniques to solve our problem with theoretical performance guarantees. As a side-product of our algorithm, we also improve the asymptotic time-complexity of skyline computation to log-linear time in the number of data points when all dimensions except one are binary in nature. Our empirical results attest that the proposed method efficiently finds high-quality top-k representative objects, while our technique is one order of magnitude faster than state-of-the-art methods for finding the top-k skylines with binary constraints.
Loading