Model Matching: A Novel Framework to use Clustering Strategy to Solve the Classification Problem

Zhiyi Duan, Limin Wang, Minghui Sun

Published: 01 Jan 2019, Last Modified: 13 May 2023IEEE Access 2019Readers: Everyone

Abstract: It is a common practice to handle labeled data with classifiers and unlabeled ones with clusterings. The traditional Bayesian network classifiers (BNC$^{\mathcal {T}}\text{s}$ ) learned from labeled training set $\mathcal {T}$ directly map the unlabeled test instance into the network structure to calculate the conditional probability for the classification, which neglects the information hidden in the unlabeled data and will result in classification bias. To address this issue, we propose a novel learning framework, called model matching, that uses the “clustering” strategy to solve the classification problem. The labeled data is divided into several clusters according to the different class label to learn a set of BNC$^{\mathcal {T}}\text{s}$ and a corresponding set of BNC$^{p}\text{s}$ is built for each unlabeled test instance. To make a classification, the cross entropy method is applied to compare the structural similarity between BNC$^{\mathcal {T}}$ and BNC <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">p</sup> . The extensive experimental results on 46 datasets from the University of California at Irvine (UCI) machine learning repository demonstrate that for BNCs model matching helps improve the generalization performance and outperforms the several state-of-the-art classifiers like tree-augmented naive Bayes and Random forest.

0 Replies