Abstract: To access the large-scale data sources efficiently and automatically, it is necessary to classify these data sources into different domains and categories. In this paper, we propose a novel classification approach to classify data sources into detail domain subjects by query probing. In our approach, we train sample instances for each subject category and use them to probe the data scale of each source and category. And then we build a matrix to classify a data source into one or more subject categories and develop a decision algorithm based on probing iteration to rectify the classification result. Our experiments over real deep web sources show that our approach can achieve higher accuracy across a variety of data sources.
Loading