Continuous k-Regret Minimization Queries: A Dynamic Coreset Approach

Jiping Zheng, Wei Ma, Yanhao Wang, Xiaoyang Wang

Published: 01 Jan 2023, Last Modified: 03 Jun 2023IEEE Trans. Knowl. Data Eng. 2023Readers: Everyone

Abstract: Finding a small set of representative tuples from a large database is an important functionality for supporting multi-criteria decision making. Top- <inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula> queries and skyline queries are two widely studied queries to fulfill this task. However, both of them have some limitations: a top- <inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula> query requires the user to provide her utility functions for finding the <inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula> tuples with the highest scores as the result; a skyline query does not need any user-specified utility function but cannot control the result size. To overcome their drawbacks, the <inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula> -regret minimization query was proposed and received much attention recently, since it does not require any user-specified utility function and returns a fixed-size result set. Specifically, it selects a set <inline-formula><tex-math notation="LaTeX">$R$</tex-math></inline-formula> of tuples with a pre-defined size <inline-formula><tex-math notation="LaTeX">$r$</tex-math></inline-formula> from a database <inline-formula><tex-math notation="LaTeX">$D$</tex-math></inline-formula> such that the maximum <inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula>-regret ratio , which captures how well the top-ranked tuple in <inline-formula><tex-math notation="LaTeX">$R$</tex-math></inline-formula> represents the top- <inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula> tuples in <inline-formula><tex-math notation="LaTeX">$D$</tex-math></inline-formula> for any possible utility function, is minimized. Although there have been many methods for <inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula> -regret minimization query processing, most of them are designed for static databases without tuple insertions and deletions. The only known algorithm to process continuous <inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula> -regret minimization queries (C <inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula> RMQ) in dynamic databases suffers from suboptimal approximation and high time complexity. In this paper, we propose a novel dynamic coreset-based approach, called DynCore , for C <inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula> RMQ processing. It achieves the same (asymptotically optimal) upper bound on the maximum <inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula> -regret ratio as the best-known static algorithm. Meanwhile, its time complexity is sublinear to the database size, which is significantly lower than that of the existing dynamic algorithm. The efficiency and effectiveness of DynCore is confirmed by experimental results on real-world and synthetic datasets.

0 Replies