ML-MDLText: A Multilabel Text Categorization Technique with Incremental Learning

Marciele M. Bittencourt, Renato Moraes Silva, Tiago A. Almeida

Published: 2019, Last Modified: 16 Dec 2024BRACIS 2019EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The growing number of textual documents currently available on electronic media and on the Internet imposes real challenges for many applications that demand searching and content analysis. As a consequence, text classification has emerged as a field of great interest in machine learning in the last decade. In many real-world applications, textual documents can naturally be labeled in different categories, and moreover, the value of their features can change over time requiring learning approaches with the ability to adjust their hypothesis in a very efficient way. Therefore, online learning and multilabel classification are in the spotlight nowadays, since very few currently available approaches are able to handle such problems simultaneously without requiring problem transformation. In this study, we propose a new multilabel text classification method based on the minimum description length principle that can be applied to real-world, dynamic, and large-scale problems because it does not require transformation of the classification problem and naturally supports online learning. We evaluated the performance of the proposed method in three online learning scenarios using 15 benchmark datasets. The results indicate that the proposed method is very competitive with the established techniques.