Turing-test Interview Emulation

ACL ARR 2026 January Submission1608 Authors

30 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: dialogue evaluation, benchmark, llms
Abstract: The paper presents a novel artificial intelligence evaluation methodology grounded in the principles of the classic Turing test. We propose the Turing-test Interview Emulation (TiE) framework, which simulates a structured dialogue for the behavioral assessment of model capabilities. In contrast to the original test, our methodology employs a sequential question-answer format across diverse thematic categories, requiring the model to maintain dialogue context and perform comparative analysis of candidate responses. The model is tasked not with selecting a single correct answer but with identifying the most appropriate option from several alternatives, thereby complicating the decision-making process and introducing an additional reasoning step. The paper introduces both text and multimodal versions of the TiE dataset in English and Russian. Using this benchmark, we conduct a comprehensive comparative evaluation of a range of open-source and proprietary large language models (LLMs) and vision-language models (VLLMs).
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: benchmarking; language resources; multilingual corpora;NLP datasets; automatic evaluation of datasets; evaluation methodologies;
Contribution Types: Model analysis & interpretability, Data resources, Data analysis
Languages Studied: English, Russian
Submission Number: 1608
Loading