Guide focused crawler efficiently and effectively using on-line topical importance estimationOpen Website

2008 (modified: 12 Nov 2022)SIGIR 2008Readers: Everyone
Abstract: Focused crawling is a critical technique for topical resource discovery on the Web. We propose a new frontier prioritizing algorithm, namely, the OTIE (On-line Topical Importance Estimation) algorithm, which efficiently and effectively combines link-based and content-based analysis to evaluate the priority of an uncrawled URL in the frontier. We then demonstrate OTIE's advantages over traditional prioritizing algorithms by real crawling experiments.
0 Replies

Loading