HANSEN: Human and AI Spoken Text Benchmark for Authorship Analysis

Nafis Irtiza Tripto; Adaku Uchendu; Thai Le; Mattia Setzu; Fosca Giannotti; Dongwon Lee

HANSEN: Human and AI Spoken Text Benchmark for Authorship Analysis

Nafis Irtiza Tripto, Adaku Uchendu, Thai Le, Mattia Setzu, Fosca Giannotti, Dongwon Lee

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 FindingsEveryoneRevisionsBibTeX

Submission Type: Regular Long Paper

Submission Track: Resources and Evaluation

Submission Track 2: NLP Applications

Keywords: Authorship analysis, Spoken text, Large Language Model, AI text detection

TL;DR: We present a benchmark comprising 17 human spoken text datasets with 3 LLM-generated spoken texts and perform authorship analysis on the benchmark.

Abstract: $\textit{Authorship Analysis}$, also known as stylometry, has been an essential aspect of Natural Language Processing (NLP) for a long time. Likewise, the recent advancement of Large Language Models (LLMs) has made authorship analysis increasingly crucial for distinguishing between human-written and AI-generated texts. However, these authorship analysis tasks have primarily been focused on $\textit{written texts}$, not considering $\textit{spoken texts}$. Thus, we introduce the largest benchmark for spoken texts - ${\sf HANSEN}$($\underline{H}$uman $\underline{AN}$d ai $\underline{S}$poken t$\underline{E}$xt be$\underline{N}$chmark). ${\sf HANSEN}$ encompasses meticulous curation of existing speech datasets accompanied by transcripts, alongside the creation of novel AI-generated spoken text datasets. Together, it comprises 17 human datasets, and AI-generated spoken texts created using 3 prominent LLMs: ChatGPT, PaLM2, and Vicuna13B. To evaluate and demonstrate the utility of ${\sf HANSEN}$, we perform Authorship Attribution (AA) \& Author Verification (AV) on human-spoken datasets and conducted Human vs. AI text detection using state-of-the-art (SOTA) models. While SOTA methods, such as, character n-gram or Transformer-based model, exhibit similar AA \& AV performance in human-spoken datasets compared to written ones, there is much room for improvement in AI-generated spoken text detection. The ${\sf HANSEN}$ benchmark is available at: https://huggingface.co/datasets/HANSEN-REPO/HANSEN

Submission Number: 2024

Loading