BERT fine-tuning for Japanese Author Attribution using Stylometric FeaturesDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: Authorship Attribution (AA) is the task of identifying the author of a text. It marks a novel effort in applying deep learning techniques for AA in Japanese, a field where such approaches have been limited. Historically, AA studies in Japanese have predominantly employed Random Forests and SVMs, focusing on small author groups and facing a scarcity of datasets for author identification. Our work diverges by fine-tuning a pre-trained BERT model to assess its efficacy in both text-only scenarios and when incorporating Japanese-specific stylistic features. Key findings reveal an 84% accuracy rate for identifying five authors using only text data, and a notable 82% accuracy with an expanded set of 80 authors, highlighting the potential of deep learning for managing larger author pools. However, the addition of stylistic features for a set of 25 authors resulted in reduced accuracy (53%). The study further achieved 97% accuracy in distinguishing Japanese speakers and 61% in nationality prediction. These outcomes emphasize the viability of deep learning-based AA in Japanese, presenting a significant advancement in the domain.
Paper Type: long
Research Area: Machine Learning for NLP
Contribution Types: NLP engineering experiment
Languages Studied: Japanese
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview