Fast T-overlap query algorithms using graphics processor units and its applications in web data query
Abstract: Given a collection of sets and a query set, a T-Overlap query identifies all sets having at least T common elements with the query. T-Overlap query is the foundation of set similarity query and join and plays an important role on web data query and processing, such as the behavior analysis of web users and the near duplicated detection of web documents. To address T-Overlap query efficiently, unlike traditional algorithms based on CPU, we aim at designing efficient GPU based algorithms. We firstly design inverted index in GPU, then choose ScanCount, a straightforward but efficient T-Overlap algorithm, as underlying algorithm to develop our GPU based T-Overlap algorithms. Depending on queries processed serially or in parallel, three new efficient algorithms are proposed based on our GPU based inverted index. Among all these three algorithms, GS-Parallel-Group processes a group of queries in parallel and supports a high degree of parallelism. Extensive experiments are carried out to compare our GPU based algorithms with other state-of-the-art CPU based algorithms. Results show that GS-Parallel-Group outperforms CPU based algorithms significantly.
External IDs:dblp:journals/www/LiJYXQZ15
Loading