Machine Learning Approach for Homepage Finding Task

Published: 2002, Last Modified: 12 Jan 2026SPIRE 2002EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This paper describes new machine learning approaches to predict the correct homepage in response to a user’s homepage finding query. This involves two phases. In the first phase, a decision tree is generated to predict whether a URL is a homepage URL or not. The decision tree then is used to filter out non-homepages from the web pages returned by a standard vector space information retrieval system. In the second phase, a logistic regression analysis is used to combine multiple sources of evidence based on the homepages remaining from the first step to predict which homepage is most relevant to a user’s query. 100 queries are used to train the logistic regression model and another 145 testing queries are used to evaluate the model derived. Our results show that about 84% of the testing queries had the correct homepage returned within the top 10 pages. This shows that our machine learning approaches are effective since without any machine learning approaches, only 59% of the testing queries had their correct answers returned within the top 10 hits.
Loading