Abstract: Being a fundamental graph operator, reachability query has been widely studied by the data mining community in the past decades. In a directed acyclic graph (DAG), one vertex is reachable by another if there exists a chain of directed edges connecting the two vertexes. The state-of-the-art (SOTA) reachability query methods mostly first index all the vertexes in the underlying DAG and assign them with different labels, and then use these indexes and/or labels to efficiently filter out as many unreachable queries as possible. Thus, because a large portion of unreachable queries can be identified without evoking any tedious path-finding process, the overall time taken by a huge number of queries is much shortened with a tolerable compensation on the additional index and/or label preprocessing time and space. In this paper, we propose the Extreme Labeling Filter (ELF), which is a novel generic filter that can be applied to existing reachability query methods to additionally identify a large number of unreachable queries. Based on the analysis of the given DAG in a systematic and autonomous manner, ELF first determines whether to use predecessors or successors to label the vertexes. Based on such self-determined labels, ELF is then able to identify a large number of unreachable queries with a low time complexity of O(1). To evaluate the performance of ELF, we apply it on 4 reachability query methods (1 conventional and 3 SOTA, all designated for reachability query in DAGs) and conduct experiments on 17 datasets of different sizes. The experimental results show that by applying ELF, all methods significantly shorten the query time.
Loading