Abstract: Text quantification is a supervised learning task estimating the relative frequency of each class for a collection of uncategorized text documents. Quantification learning has an increasing number of applications in practice and presents unique challenges that are often overlooked in classification problems, such as dealing with distribution shift. Many studies on quantification use artificially re-sampled test sets to evaluate models under varying target label distributions. Despite being a convenient solution, label-based biased sampling changes the underlying test data distribution and makes it hard to rely on the results to deploy models in practice. This paper introduces a text quantification benchmark consisting of 8 datasets across sentiment analysis, document categorization, and toxicity classification. We compare popular quantification baselines on the benchmark and show that there is no model consistently outperforming others. Therefore, we believe the benchmark should enable new community research to tackle text quantification under temporal distribution shift and develop reliable models in real-world applications.
Paper Type: long
0 Replies
Loading