Heterogeneous-training: A Semi-supervised Text Classification Method

Published: 01 Jan 2023, Last Modified: 15 May 2025BDIOT 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: With the advent of the information age, there are more and more text data on the Internet. As the most widely distributed information carrier with the largest amount of data, it is particularly important to use text classification technology to organize and manage massive data scientifically. In this paper, a semi-supervised ensemble learning algorithm Heterogeneous-training is proposed and applied to the field of text classification. Based on the Tri-training algorithm, the Heterogeneous-training algorithm improves the traditional Tri-training algorithm by using different classifiers, dynamically updating the probability threshold and adaptively editing data. A large number of experiments show that our method always outperforms Tri-training algorithm in text classification on benchmark text data sets.
Loading