Using Link-Based Content Analysis to Measure Document Similarity Effectively

Pei Li, Zhixu Li, Hongyan Liu, Jun He, Xiaoyong Du

Published: 2009, Last Modified: 13 Nov 2023APWeb/WAIM 2009Readers: Everyone

Abstract: Along with a massive amount of information being placed online, it is a challenge to exploit the internal and external information of documents when assessing similarity between them. A variety of approaches have been proposed to model the document similarity based on different foundations, but usually they are not applicable for combining internal and external information. In this paper, we introduce a link-based method into content analysis, which is based on random walk on graphs. By defining similarity as the meeting probability of two random surfers, we propose a computational model for content analysis, which can also be integrated with external information of documents. Empirical study shows that our method achieves good accuracy, acceptable performance and fast convergent rate in multi-relational document similarity measuring.

0 Replies