Hybrid Regret Minimization: A Submodular Approach

Jiping Zheng, Fanxu Meng, Yanhao Wang, Xiaoyang Wang, Sheng Wang, Yuan Ma, Zhiyang Hao

Published: 01 Jan 2024, Last Modified: 23 Jul 2025IEEE Trans. Knowl. Data Eng. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Regret minimization queries are important methods to extract representative tuples from databases. They have been extensively investigated in the last decade due to wide applications in multi-criteria decision making. For a given database $D$ and a class $\mathcal {F}$ of utility functions (e.g., all nonnegative linear functions), two typical regret minimization queries considered in existing studies are maximum regret minimization (MRM) and average regret minimization (ARM) queries, whereby a subset of $k$ tuples is selected from $D$ to minimize the maximum or average of regret ratios among all utility functions in $\mathcal {F}$ , respectively. However, due to the different properties of maximum and average regret ratios, the result of one query cannot fulfill the requirement of the other. To the best of our knowledge, there has not yet been any attempt to combine both queries. In this paper, we first introduce the hybrid regret minimization (HRM) query, which simultaneously minimizes the maximum and average regret ratios. We show that finding the optimal result for an HRM query is NP-hard, but it is possible to exploit submodularity for approximate HRM query processing. We propose an efficient asymptotic approximation algorithm based on submodular maximization to process HRM queries and several optimization techniques, such as memoization, lazy evaluation, and stochastic subsampling, to improve query efficiency. Furthermore, we consider extending a multiplicative weights update (MWU) algorithm for multi-objective submodular maximization to provide higher-quality results for HRM queries. Finally, we demonstrate that our proposed algorithms achieve better performance for HRM queries than existing methods specific to MRM and ARM queries through extensive experiments on real-world and synthetic datasets. Meanwhile, our proposed algorithms are efficient and scalable to large datasets.