Context Minimization through Linguistic Features: Optimizing the Trade-off between Performance and Efficiency in Text Classification

Context Minimization through Linguistic Features: Optimizing the Trade-off between Performance and Efficiency in Text Classification

ACL ARR 2025 February Submission7159 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Pretrained language models have redefined text classification, consistently setting new benchmarks. However, their insatiable demand for computational resources and time makes them impractical in many resource-constrained environments. We introduce a simple yet effective approach to drastically minimize input context while preserving classification performance. Our method synergistically integrates linguistic insights, incorporating positional elements, syntactic structures, semantic attributes, and statistical measures to identify the most informative contexts. We evaluate our approach on six diverse datasets, including our newly introduced CMLA11 dataset, rigorously assessing 35 context configurations per dataset. Our approach delivers substantial efficiency gains, significantly reducing computational overhead while maintaining strong classification performance. Specifically, it achieves a 69–75% reduction in GPU memory usage, an 81–87% decrease in training time, and an 82–88% improvement in inference speed. Despite these drastic resource savings, our best configurations maintain near-parity with full-length inputs, with F1 (macro) reductions averaging as low as 1.39% and 3.10%, while some configurations even outperform the baseline. Beyond efficiency, our method yields remarkable data compression, reducing dataset sizes by an average of 72.57%, with reductions reaching 92.63% for longer documents. These findings underscore the potential of context minimization for real-world text classification, enabling substantial computational savings with minimal performance trade-offs.

Paper Type: Long

Research Area: Efficient/Low-Resource Methods for NLP

Research Area Keywords: Efficient/Low-Resource Methods for NLP, NLP Applications

Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Approaches low compute settings-efficiency

Languages Studied: English

Submission Number: 7159

Loading