Strength lies in both: a blend of static and contextual word-information improves performance of low-resource NLP
Abstract: Low-resource NLP often suffers because of insufficient computing resources and data scarcity. Specifically, low-resource autonomous devices and resource-constrained environments require a low memory footprint, optimal accuracy for scarce data resources, and reproducibility of the results. To address these issues, we combine contextual and static information of a word to form a blended embedding. Blended embedding and CNN/RNN fusion models optimize against energy cost, inference time, and carbon emission, maximizing the NLP accuracies while avoiding resource-intensive transformer models such as the BERT and its low-resource variants. Experimentation with a few GLUE datasets demonstrates that the developed models compete with other low-resource solutions, such as the DistilBERT, mBERT, TinyBERT, and BERT-mini, with the advantage of higher accuracy and low energy cost. In addition, blended embedding exhibits the potential to achieve better reproducibility of model performance, measured by a reduction of the standard deviation of NLP accuracy. Besides, the cartography analysis done on training samples shows that blended embedding reduces hard-to-learn data. The proposed work provides a viable solution for NLP applications in resource-constrained environments, such as mobile devices and other embedded platforms.
Paper Type: Long
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: Embedding, Contextual Information, Static Embedding, Reproducibility
Contribution Types: Approaches to low-resource settings, Approaches low compute settings-efficiency, Data analysis, Theory
Languages Studied: English
Submission Number: 8498
Loading