The Reverse Turing Test for Evaluating Interpretability Methods on Unknown Tasks

Anonymous

The Reverse Turing Test for Evaluating Interpretability Methods on Unknown Tasks

Anonymous

15 Oct 2020 (modified: 05 May 2023)HAMLETS @ NeurIPS2020Readers: Everyone

Abstract: The Turing Test evaluates a computer program’s ability to mimic human behaviour. The Reverse Turing Test, reversely, evaluates a human’s ability to mimic machine behaviour in a forward prediction task. We propose to use the Reverse Turing Test to evaluate the quality of interpretability methods. The Reverse Turing Test improves on previous experimental protocols for human evaluation of interpretability methods by a) including a training phase, and b) masking the task, which, combined, enables us to evaluate models independently of their quality, in a way that is unbiased by the participants' previous exposure to the task. We present a human evaluation of LIME across five NLP tasks in a Latin Square design and analyze the effect of masking the task in forward prediction experiments. Additionally, we demonstrate a fundamental limitation of LIME and show how this limitation is detrimental for human forward prediction in some NLP tasks.

0 Replies

Loading