Abstract: ver the last years, state-of-the-art AI models have grown to a point where their use bears significant economic and environmental cost.
At the same time, investigation of NLP models has shown that they are often overparameterized, giving rise to research of compression approaches.
Such approaches often suffer the trade-off between hardware requirements and classification performance.
In this work, we propose the hardware-independent compression strategy Adaptive Parameter Compression (APC).
We extend the Weight Squeezing approach by introducing compression biases and weights, as well as investigating multiple initialization strategies for these weights and the application of APC to transformer model components.
Experiments with BERT$_\text{base}$ show the compression's effectiveness, slightly outperforming DistilBERT while being significantly more efficient.
Paper Type: Long
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: pruning,distillation
Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency, Theory
Languages Studied: English
Submission Number: 458
Loading