Random Label Forests: An Ensemble Method with Label Subsampling For Extreme Multi-Label Problems

Random Label Forests: An Ensemble Method with Label Subsampling For Extreme Multi-Label Problems

ACL ARR 2024 June Submission2336 Authors

15 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Text classification is one of the essential topics in natural language process fields, and the prediction in text classification can usually be multiple labels. Thus, a text classification problem can be a particular problem in machine learning: a multi-label classification problem. Recently, the number of labels has become larger and larger, especially in the applications of e-commerce, so handling text-related e-commerce problems further requires more and more memory space in many existing multi-label learning methods. Hence, utilizing distributed system to share those large memory requirement is a reasonable solution. We propose ``random label forests'', a distributed ensemble method with label subsampling, for handling extremely large-scale labels by label subsampling and parallel computing. Random label forests can reduce the memory usage per computer while keeping competitive performances over six real-world data sets.

Paper Type: Long

Research Area: Efficient/Low-Resource Methods for NLP

Research Area Keywords: NLP in resource-constrained settings

Contribution Types: Model analysis & interpretability, Approaches to low-resource settings

Languages Studied: English

Submission Number: 2336

Loading