Abstract: Nowadays, approximate string search and join, as essential operations in data integration and cleaning, has attracted significant attentions in academic. In this paper, we study string similarity search and join with edit distance constraints. Although multicore machines have become the mainstream computer architecture, most existing methods only work on a uniprocessor. To address this problem, we propose a novel parallel framework using BWT. We also devise efficient technique to utilize cache to further speed up the performance. Our method can solve similar search and join efficiently and generally. We conducted a comprehensive experimental study of our method to demonstrate the efficiency.
Loading