Counter Turing Test ($CT^2$): Investigating AI-Generated Text Detection for Hindi - Ranking LLMs based on Hindi AI Detectability Index ($ADI_{hi}$)Download PDF

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone
Abstract: The widespread adoption of large language models (LLMs) like GPTs, BARD, and others has raised concerns regarding the potential risks and repercussions linked to the misapplication of AI-generated text, necessitating increased vigilance. While these models are primarily trained for English, their extensive training on vast datasets covering almost the entire web equips them with capabilities to perform well in numerous other languages such as Hindi and Spanish. AI-generated text detection (AGTD) has emerged as a topic that has already received immediate attention in research, with some initial methods having been proposed, soon followed by the emergence of techniques to bypass detection. In this paper, we report our investigation on AGTD for the Hindi language: i) examined 16 large language models (LLMs) to evaluate their proficiency in generating Hindi text; introducing the AI-generated news article in Hindi (AG\textsubscript{hi}) dataset, ii) thoroughly evaluated the effectiveness of four recently proposed AGTD techniques: ConDA, J-Guard, RADAR, and Intrinsic Dimension Estimation for detecting AI-generated Hindi text, iii) proposed Hindi AI Detectability Index ($ADI_{hi}$) which shows a spectrum to understand the evolving landscape of eloquence of AI-generated text in Hindi and efficacy of available AGTD techniques to counter adversarial use of LLMs for Hindi.
Paper Type: long
Research Area: Resources and Evaluation
Contribution Types: Data resources
Languages Studied: English, Hindi
0 Replies

Loading