Bidirectional Transformer Based Multi-Task Learning for Natural Language Understanding

Suraj Tripathi, Chirag Singh, Abhay Kumar, Chandan Kumar Pandey, Nishant Jain

Published: 01 Jan 2019, Last Modified: 10 Jun 2024NLDB 2019EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We propose a multi-task learning based framework for natural language understanding tasks like sentiment and topic classification. We make use of bidirectional transformer based architecture to generate encoded representations from given input followed by task-specific layers for classification. Multi-Task learning (MTL) based framework make use of a different set of tasks in parallel, as a kind of additional regularization, to improve the generalizability of the trained model over individual tasks. We introduced a task-specific auxiliary problem using the k-means clustering algorithm to be trained in parallel with main tasks to reduce the model’s generalization error on the main task. POS-tagging was also used as one of the auxiliary tasks. We also trained multiple benchmark classification datasets in parallel to improve the effectiveness of our bidirectional transformer based network across all the datasets. Our proposed MTL based transformer network improved state-of-the-art overall accuracy of Movie Review (MR), AG News, and Stanford Sentiment Treebank (SST-2) corpus by 6%, 1.4%, and 3.3% respectively.