Open-World Authorship Attribution

Open-World Authorship Attribution

ACL ARR 2025 February Submission627 Authors

10 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Recent years have witnessed rapid advancements in Large Language Models (LLMs). Nevertheless, it remains unclear whether state-of-the-art LLMs can infer the author of an anonymous research paper solely from the text, without any additional information. To investigate this novel challenge, which we define as Open-World Authorship Attribution, we introduce a benchmark comprising thousands of research papers across various fields to quantitatively assess model capabilities. Then, at the core of this paper, we tailor a two-stage framework to tackle this problem: candidate selection and authorship decision. Specifically, in the first stage, LLMs are prompted to generate multi-level key information, which are then used to identify potential candidates through Internet searches. In the second stage, we introduce key perspectives to guide LLMs in determining the most likely author from these candidates. Extensive experiments on our benchmark demonstrate the effectiveness of the proposed approach, achieving 60.7% and 44.3% accuracy in the two stages, respectively. We will release our benchmark and source codes to facilitate future research in this field.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: NLP Applications

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 627

Loading