Deep Bangla Authorship Attribution Using Transformer Models

Abdullah Al Imran, Md Nur Amin

Published: 01 Jan 2021, Last Modified: 11 Nov 2023CSoNet 2021Readers: Everyone

Abstract: Authorship attribution is one of the renowned problems in the domain of Natural Language Processing (NLP). Leveraging the state-of-the-art (SOTA) techniques of NLP such as transformer models, this problem domain has achieved a considerable advancement. However, this progress is unfortunately only bound to the well-resourced languages like English, French, and German. Under-resourced language like Bangla is yet to leverage such SOTA techniques to make a breakthrough in this domain. In this study, we address this research gap and aim to contribute to the Bangla authorship attribution problem by building highly accurate models using several SOTA variants of transformer models like mBERT, bnBERT, bnElectra, and bnRoBERTa. Using the pre-trained weights of these models we have performed fine-tuning and tackled the task of authorship attribution of 16 prominent Bangla writers. Outcomes show that our bnBERT model can classify the authors with superior accuracy of 98% and also outperform all the existing models available in the literature.

0 Replies