Content Based Spam Text Classification: An Empirical Comparison between English and Chinese

Liumei Zhang, Jianfeng Ma, Yichuan Wang

Published: 2013, Last Modified: 10 Feb 2025INCoS 2013EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Spam text including e-mails, SMS and etc, is a real and growing problem primarily due to the availability of digital handset and internet. To filter spam text is to be the utmost topic over varies study area. Text bodies of different forms of communication expose channel for spammers. In this study, text dataset in English and Chinese are pre-processed. Classical classifiers are applied on the pre-processed dataset to evaluate the accuracy of the same classifier. The behavior of classifiers among English and Chinese is evaluated. The paper also discussed the result of experiments. In addition, different from most existing text spam detection methods which are based on English, classifiers suited for English text classification is insufficient for Chinese text classification.