Abstract: Text classification is one of the essential topics in natural language process fields,
and the prediction in text classification can usually be multiple labels.
Thus, a text classification problem can be a particular problem in machine learning: a multi-label classification problem.
Recently, the number of labels has become larger and larger, especially in the applications of e-commerce,
so handling text-related e-commerce problems further requires more and more memory space in many existing multi-label learning methods.
Hence, utilizing distributed system to share those large memory requirement is a reasonable solution.
We propose ``random label forests'', a distributed ensemble method with label subsampling,
for handling extremely large-scale labels by label subsampling and parallel computing.
Random label forests can reduce the memory usage per computer while keeping competitive performances over six real-world data sets.
Paper Type: Long
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: NLP in resource-constrained settings
Contribution Types: Model analysis & interpretability, Approaches to low-resource settings
Languages Studied: English
Submission Number: 2336
Loading