MULTILINGUAL EVALUATION OF HUMAN VS. AI TEXT CLASSIFICATION WITH ZERO-SHOT ANALYSIS OF CONTEMPORARY LLM ARCHITECTURES.

Published: 14 Dec 2025, Last Modified: 11 Jan 2026LM4UC@AAAI2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: AI text detection, Multilingual text classification, Machine learning classifiers, Transformer models, Zero-shot learning, Large language models, Cross-lingual evaluation, Natural language processing, AI content detection, Deep learning, Computational linguistics
TL;DR: Traditional ML outperforms transformers in multilingual AI detection. Critical finding: small open-source LLMs completely evade detection while commercial models are caught—major security flaw.
Abstract: Human-AI text recognition has emerged as an essential problem in maintaining the authenticity of digital content worldwide. In spite of advancements, current detection tools largely cater to English texts only, causing a major lacuna in covering multilingual scenarios. This paper introduces the first end-to-end multilingual approach to human vs. AI text categorization for Hindi and Spanish languages. We compare traditional machine learning classifiers and state-of-the-art transformer models using three stages: baseline validation on English data, multilingual evaluation on carefully filtered Hindi and Spanish datasets, and zero-shot generalization from English outputs of various modern large language models like Gemini, Phi, and others. Our findings show better accuracy and F1-scores, with models like XGBoost and T5 posting perfect scores (1.00) in multilingual environments. Interestingly, classical models beat transformer-based methods in cross-lingual settings by a maximum of 0.17 increase in F1-score. Experiments in zero-shot testing indicate inconsistent detectability of current LLMs, with commercial models detected consistently but smaller open-source models going undetected. This paper tackles critical gaps in text authenticity check, facilitating secure multilingual AI text detection for real-world applications in education, media, and content verification.
Submission Number: 38
Loading