An Efficient Framework for Exact Set Similarity Search Using Tree Structure IndexesDownload PDFOpen Website

2017 (modified: 07 Feb 2025)ICDE 2017Readers: Everyone
Abstract: Similarity search is an essential operation in many applications. Given a collection of set records and a query, the exact set similarity search aims at finding all the records that are similar to the query from the collection. Existing methods adopt a filter-and-verify framework, which make use of inverted indexes. However, as the complexity of verification is rather low for setbased similarity metrics, they always fail to make a good tradeoff between filter power and filter cost. In this paper, we proposed an efficient framework for exact set similarity search based on tree index structure. We defined a hash-based ordering to effectively import data into the index structure and then make optimizations to reduce the filter cost. To further improve the filter power, we proposed a dynamic algorithm to partition the dataset into several parts and propose a multiple-index framework. Experimental results on real-world datasets show that our method significantly outperform the state-of-the-art algorithms.
0 Replies

Loading