Exploring the Potential of AI-Generated Texts to Replace Human-Written Content in Language Education

University of Eastern Finland DRDHum 2024 Conference Submission18 Authors

Published: 03 Jun 2024, Last Modified: 03 Jun 2024DRDHum 2024 BestPaperEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-Dimensional Analysis, Language Teaching, Artificial Intelligence
TL;DR: Research examines if AI can match the complexity of English coursebook texts for language teaching. Analysis shows AI texts differ notably from human-written texts.
Abstract: The incorporation of AI-generated texts into educational materials is an emerging topic of interest, particularly concerning the potential application of AI in crafting tasks for language teaching. The goal of our research is to examine the capability of AI-generated texts to replicate the characteristics of English coursebook text samples used in language instruction. This analysis enables assessing the viability of replacing traditional human-written instructional content with AI-generated texts in educational settings. Our investigation is set against the background that textbook texts, conventionally employed as exemplars in language education, are specifically tailored and revised to match a certain level of difficulty, and thus do not fully represent authentic language usage in everyday scenarios. Nevertheless, these coursebook texts exhibit a distinct form of human authorship, shaped by the instructional requirements of students learning a second language. The ability of AI to produce simplified texts that are on par with those created by humans remains an open question. To fill this gap, we conducted a Multi-Dimensional Analysis (Biber, 1988, 1995; Berber Sardinha & Veirano Pinto, 2014, 2019) of our English Language Teaching textbook corpus (ELTT corpus), encompassing 106,840 words from 500 texts across 19 different registers. These texts, sourced from 43 books by major publishers over 25 years (1996 to 2021), spans B2 and C1 levels, with an equal number of texts from each level. Five dimensions were identified, namely (1) Persuasion, speaker engagement, and personal opinion vs Expression of analysis and technical information; (2) Expressive, interactive, speculative discourse with stance marking; (3) Formal, informative, detailed composition; (4) Narrative and descriptive accounts; (5) Summarized abstracted overviews. Each dimension comprises a set of correlated grammatical features performing the major functions corresponding to the dimensions. As a comparison sample, we created an AI-generated corpus (AI-ELTT corpus) using ChatGPT to simulate textbook texts, resulting in 500 comparable texts. In general, the results showed that AI EFL coursebook text models are different from human counterparts. First, AI struggles with producing texts that emphasize persuasion, speaker engagement, and personal opinion. Instead, AI-generated texts are characterized by the expression of analysis and technical information. Secondly, AI faces difficulties in producing language that is expressive, interactive, and speculative with stance marking, reducing the incidence of these features. Given these differences, it was possible to successfully differentiate AI from human texts in more than 80% of cases.
Submission Number: 18
Loading