A Review for Weighted MinHash Algorithms (Extended abstract)Download PDFOpen Website

Published: 01 Jan 2023, Last Modified: 12 Jan 2024ICDE 2023Readers: Everyone
Abstract: Data similarity computation is a fundamental research topic which underpins many high-level applications based on similarity measures. However, the exact similarity computation has become daunting in large-scale real-world scenarios. Currently, MinHash is a popular technique for efficiently estimating the Jaccard similarity of binary sets and, furthermore, weighted MinHash is utilized to estimate the generalized Jaccard similarity of weighted sets. This review focuses on categorizing and discussing the existing works of weighted MinHash algorithms. Also, we have developed a Python toolbox for the algorithms, and released it in our github.
0 Replies

Loading