Implementation of a System for Fast Text Search and Document Comparison

Published: 01 Jan 2014, Last Modified: 11 Nov 2024Intelligent Tools for Building a Scientific Information Platform 2014EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This chapter presents an architecture of the system for fast text search and documents comparison with main focus on N-gram-based algorithm and its parallel implementation. The algorithm which is one of several computational procedures implemented in the system is used to generate a fingerprint of analyzed documents as a set of hashes which represent the file. This work examines the performance of the system, both in terms of a file comparison quality and a fingerprint generation. Several tests were conducted of N-gram-based algorithm for Intel Xeon E5645, 2.40 GHz which show approximately 8x speedup of multi over single core implementation.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview