Assessing the Linguistic Characteristics of AI-Generated Texts Across Different Registers

University of Eastern Finland DRDHum 2024 Conference Submission19 Authors

Published: 03 Jun 2024, Last Modified: 03 Jun 2024DRDHum 2024 BestPaperEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-Dimensional Analysis, Register Variation, Artificial Intelligence
TL;DR: The study shows AI struggles to match human text in different registers, highlighting the need for register consideration in AI text quality evaluation.
Abstract: Recent studies that compare AI-generated texts to those authored by humans predominantly focus on lexical characteristics. This has resulted in a limited understanding of the ability of AI to mimic human writing from a lexicogrammatical perspective. Furthermore, this body of research often overlooks the role of register variation, neglecting to examine the degree to which AI-generated texts reflect the register-specific features found in human-authored texts. The premise of this study is that an evaluation of AI-generated text quality necessitates consideration of register. Prior corpus-based analyses of human-authored texts have convincingly shown that register significantly influences linguistic variation (Biber, 2012). Hence, for assessments aiming to determine equivalency between human-authored and AI-generated texts, register must be taken into account. It is postulated that the majority of training data for Large Language Models lacks explicit register labels, leading to a predominance of inferred over directly learned register distinctions by AI, which raises concerns about the precision and dependability of register knowledge in AI models. In this paper, we employ Multi-Dimensional (MD) Analysis (Biber, 1988, 1995; Berber Sardinha & Veirano Pinto, 2014, 2019) to assess the similarity between AI-generated and human-authored texts. This involves a detailed MD analysis of a corpus comprising register-specific texts produced by humans in natural settings and texts generated by ChatGPT 3.5. The comparison is grounded in the five principal dimensions of register variation identified by Biber (1988), which are determined by sets of co-occurring lexicogrammatical features. Both AI and human subcorpora include four distinct registers: news reports, research articles (in Chemistry and Applied Linguistics), student compositions, and conversations, with each category containing 100 texts, for a total of 800 texts (546,568 words). The human-authored subcorpus was compiled from verified sources that predate the public availability of AI to avoid any AI-generated content. The MD analysis indicated notable differences between AI-generated and human-authored texts across the individual registers and the five dimensions, with AI-generated texts generally not mirroring their human counterparts accurately. Additionally, a linear discriminant analysis, conducted to evaluate the capability of dimension scores to predict text authorship, showed that AI-generated texts could be distinguished with relative ease based on their multidimensional profiles. The findings highlight the existing challenges AI faces in replicating natural human communication effectively. The specifics of the register-based comparisons will be elaborated in the full paper.
Submission Number: 19
Loading