Abstract: Pruning, as a technique to reduce the complexity and size of transformer-based models, has gained significant attention in recent years. While various models have been successfully pruned, pruning bidirectional encoder representations from transformers (BERT) poses unique challenges due to their fine-grained structure and overparameterization. However, by carefully considering these factors, it is possible to prune BERT without significantly degrading its pretrained loss. In this article, we propose a metalearning-based pruning approach that can adaptively identify and eliminate insignificant attention weights. The performance of the proposed model is compared with several baseline models, as well as the default fine-tuned BERT model. The baseline pruning strategies employed low-level pruning techniques, targeting the removal of only 20% of the connections. The experimental results show that the proposed model outperforms the other baseline models, in terms of lower inference latency, higher MCC, and lower loss. However, there is no significant improvement observed in terms of average floating-point operations per second (FLOPs). Furthermore, we conduct a comparative evaluation of the baseline models and our proposed model using two explainable artificial intelligence (XAI) approaches. While other models allocate reasonable attention to less significant words for sentiment classification, our model assigns higher probabilities to the most significant sentimental words.
External IDs:dblp:journals/tai/RajapakshaC24
Loading