Strength lies in both: a blend of static and contextual word-information improves performance of low-resource NLP

Strength lies in both: a blend of static and contextual word-information improves performance of low-resource NLP

ACL ARR 2025 February Submission8498 Authors

16 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Low-resource NLP often suffers because of insufficient computing resources and data scarcity. Specifically, low-resource autonomous devices and resource-constrained environments require a low memory footprint, optimal accuracy for scarce data resources, and reproducibility of the results. To address these issues, we combine contextual and static information of a word to form a blended embedding. Blended embedding and CNN/RNN fusion models optimize against energy cost, inference time, and carbon emission, maximizing the NLP accuracies while avoiding resource-intensive transformer models such as the BERT and its low-resource variants. Experimentation with a few GLUE datasets demonstrates that the developed models compete with other low-resource solutions, such as the DistilBERT, mBERT, TinyBERT, and BERT-mini, with the advantage of higher accuracy and low energy cost. In addition, blended embedding exhibits the potential to achieve better reproducibility of model performance, measured by a reduction of the standard deviation of NLP accuracy. Besides, the cartography analysis done on training samples shows that blended embedding reduces hard-to-learn data. The proposed work provides a viable solution for NLP applications in resource-constrained environments, such as mobile devices and other embedded platforms.

Paper Type: Long

Research Area: Efficient/Low-Resource Methods for NLP

Research Area Keywords: Embedding, Contextual Information, Static Embedding, Reproducibility

Contribution Types: Approaches to low-resource settings, Approaches low compute settings-efficiency, Data analysis, Theory

Languages Studied: English

Submission Number: 8498

Loading