BERT fine-tuning for Japanese Author Attribution using Stylometric Features

Anonymous

BERT fine-tuning for Japanese Author Attribution using Stylometric Features

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone

Abstract: Authorship Attribution (AA) is the task of identifying the author of a text. It marks a novel effort in applying deep learning techniques for AA in Japanese, a field where such approaches have been limited. Historically, AA studies in Japanese have predominantly employed Random Forests and SVMs, focusing on small author groups and facing a scarcity of datasets for author identification. Our work diverges by fine-tuning a pre-trained BERT model to assess its efficacy in both text-only scenarios and when incorporating Japanese-specific stylistic features. Key findings reveal an 84% accuracy rate for identifying five authors using only text data, and a notable 82% accuracy with an expanded set of 80 authors, highlighting the potential of deep learning for managing larger author pools. However, the addition of stylistic features for a set of 25 authors resulted in reduced accuracy (53%). The study further achieved 97% accuracy in distinguishing Japanese speakers and 61% in nationality prediction. These outcomes emphasize the viability of deep learning-based AA in Japanese, presenting a significant advancement in the domain.

Paper Type: long

Research Area: Machine Learning for NLP

Contribution Types: NLP engineering experiment

Languages Studied: Japanese

0 Replies

Loading