Automatically Identifying Pseudepigraphic TextsDownload PDF

2013 (modified: 16 Jul 2019)EMNLP 2013Readers: Everyone
Abstract: The identification of pseudepigraphic texts – texts not written by the authors to which they are attributed – has important historical, forensic and commercial applications. We introduce an unsupervised technique for identifying pseudepigrapha. The idea is to identify textual outliers in a corpus based on the pairwise similarities of all documents in the corpus. The crucial point is that document similarity not be measured in any of the standard ways but rather be based on the output of a recently introduced algorithm for authorship verification. The proposed method strongly outperforms existing techniques in systematic experiments on a blog corpus.
0 Replies

Loading