MulRF: A Multi-Dimensional Range Filter for Sublinear Time Range Query Processing

Published: 01 Jan 2024, Last Modified: 06 Nov 2025IEEE Trans. Knowl. Data Eng. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Range query is an important operation on big multi-dimensional data. This paper studies the problem of multi-dimensional range query filtering for speeding up the range query processing by avoiding reading the useless data. To solve the problem, a novel multi-dimensional range filter is proposed to filter the multi-dimensional range queries, while the existing one-dimensional range filters can not provide efficient filtering. Based on the multi-dimensional range filter, an efficient range query processing algorithm is presented. It can directly return the locations of the I/O units that contain the data in the query result without any access to the input dataset. The time complexity of the algorithm is $O(3^{m}h)$, where $h$ is the number of I/O units partially overlapping with a range query, and $m$ is the dimension number. Since $m$ is usually $o(\sqrt{\log n})$, it is a sublinear time algorithm if $V=O(n)$, where $n$ is the size of the input dataset, $V=\prod _{i=1}^{m}d_{i}$, and $d_{i}$ is the number of distinct values on the $i$-th dimension of the dataset for $1\leq i\leq m$. Experimental results show that the multi-dimensional range filter has low false positive rate and good filtering efficiency. The proposed range query processing algorithm achieves at least 3$\sim$7 times improvement compared to the one-dimensional filter based algorithms on different datasets.
Loading