Learning to change taxonomies

Elena Eneva; Valery A. Petrushin

Learning to change taxonomies

Elena Eneva, Valery A. Petrushin

Published: 01 Jan 2002, Last Modified: 20 May 2025Data Mining and Knowledge Discovery: Theory, Tools, and Technology 2002EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Taxonomies are valuable tools for structuring and representing our knowledge about the world. They are widely used in many domains, where information about species, products, customers, publications, etc. needs to be organized. In the absence of standards, many taxonomies of the same entities can co-exist. A problem arises when data categorized in a particular taxonomy needs to be used by a procedure (methodology or algorithm) that uses a different taxonomy. Usually, a labor-intensive manual approach is used to solve this problem. This paper describes a machine learning approach which aids domain experts in changing taxonomies. It allows learning relationships between two taxonomies and mapping the data from one taxonomy into another. The proposed approach uses decision trees and bootstrapping for learning mappings of instances from the source to the target taxonomies. A C4.5 decision tree classifier is trained on a small manually labeled training set and applied to a randomly selected sample from the unlabeled data. The classification results are analyzed and the misclassified items are corrected and all items are added to the training set. This procedure is iterated until unlabeled data is available or an acceptable error rate is reached. In the latter case the last classifier is used to label all the remaining data. We test our approach on a database of products obtained from as grocery store chain and find that it performs well, reaching 92.6% accuracy while requiring the human expert to explicitly label only 18% of the entire data.

Loading