Abstract: Authorship attribution (AA), an area of research seeking to identify the author of a particular text, is typically conducted on a closed set of authors, and often on certain forms of text, such as edited and less colloquial language like that available in news articles. This paper introduces a few-shot learning approach using prototypical networks and a mix of stylometric and pre-trained transformer-related features, as applied to Reddit data.By employing few-shot learning and applying our efforts to social media text, we are looking to expand beyond the typical AA application--allowing for disjoint author sets and shorter, more colloquial forms of English. Additionally, using subreddit IDs as a proxy for topics, we explore cross-topic analysis and differentiate performance accordingly. In so doing, we test the limits of AA, with the goal of setting a baseline for performance and assessing viability of few-shot learning for this task. Of the exhibited models, those trained with transformer embeddings performed well compared to ones with only stylometric features, and accounting for differing subreddits showed varying performances across models.
Paper Type: long
0 Replies
Loading