Automatic Deep Web Query Results Extraction Based on Tag Trees

Ying Xie, Wanli Zuo, Fengling He, Ying Wang

Published: 2009, Last Modified: 10 Feb 2025ISCID (2) 2009EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Automatic deep Web query results extraction is a key step of deep Web query results processing. Extracting the query results correctly is the precondition and guarantee of realizing semantic annotation and data integration. In this paper, a simple method for extracting deep Web query results automatically based on tag trees is proposed according to the features of deep Web query results page. The method first builds a tag tree of the given result page. Then finds minimal data regions in the tag tree from top to down, and extracts data records included by them. The experiment has shown that the method is effective.