Multitask Learning Using BERT with Task-Embedded AttentionDownload PDFOpen Website

2021 (modified: 24 Apr 2023)IJCNN 2021Readers: Everyone
Abstract: Multitask learning helps to obtain a meaningful representation of the data, retaining a small number of parameters needed to train the model. In natural language processing, models often reach as many as a few hundred million trainable parameters, which makes adaptations to new tasks computationally infeasible. Creating shareable layers for multiple tasks allows to spare resources and often leads to better data representation. In this work, we propose a new approach to train a BERT model in a multitask setup, which we call EmBERT. To introduce information about the task, we inject task-specific embeddings to the multi-head attention layers. Our modified architecture requires a minimal number of additional parameters relative to the original BERT model (+0.025% per task) while achieving state-of-the-art results in the GLUE benchmark.
0 Replies

Loading