Keywords: AI-generated text, linguistic diversity, large language models, cross-linguistic comparison
Working Group: WG3: Multilingual and cross-lingual language technology, WG4: Quantifying and promoting diversity
Abstract: With the recent rise of generative AI tool use, AI-generated texts are beginning to establish a large presence in various professional fields. A number of studies have already been conducted particularly in English to determine the level of linguistic diversity of AI-generated texts, and concerns are often raised about the commonly reported lack of linguistic diversity of such texts. To contribute to this line of research, our study extends the analysis to AI-generated essays in Slovenian and English, examining the level of lexical, n-gram, and syntactic diversity compared to human-written texts and focusing particularly on the question of whether the same patterns arise in both languages. Our experiments show a clear gap in linguistic diversity between LLM-generated and human-written texts, with diversity generally lower in AI-generated essays and the same patterns emerging in both languages. An important exception is the level of lexical diversity, where some variation is observed depending on the model used.
Tracks For Type Of Contribution: Work in progress
Do You Need Visa To Attend The 4th UniDive General Meeting In Romania: No
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 58
Loading