DACT-BERT: Increasing the efficiency and interpretability of BERT by using adaptive computation time.Download PDF

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone
Abstract: Large-scale pre-trained language models have shown remarkable results in diverse NLP applications. Unfortunately, these performance gains have been accompanied by a significant increase in computation time and model size, stressing the need to develop new or complementary strategies to increase the efficiency and interpretability of current large language models, such as BERT. In this paper we propose DACT-BERT, a differentiable adaptive computation time strategy for BERT language model. DACT-BERT adds an adaptive computation mechanism to the regular processing pipeline of BERT. This mechanism controls the number of transformer blocks that BERT needs to execute at inference time. By doing this, the model makes predictions based on the most appropriate intermediate representations for the task encoded by the pre-trained weights. With respect to previous works, DACT-BERT has the advantage of being fully differentiable and directly integrated to BERT's main processing pipeline. This enables the incorporation of gradient-based transparency mechanisms to improve interpretability. Furthermore, by discarding useless steps, DACT-BERT facilitates the understanding of the underlying process used by BERT to reach an inference. Our experiments demonstrate that our approach is effective in significantly reducing computational complexity without affecting model accuracy. Additionally, they also demonstrate that DACT-BERT helps to improve model interpretability.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
One-sentence Summary: We propose a version of BERT that uses the DACT algorithm for efficiency and interpretability.
Reviewed Version (pdf): https://openreview.net/references/pdf?id=rkkGKBaN-
4 Replies

Loading