Exact Paired Permutation Testing Algorithms for NLP SystemsDownload PDF

Anonymous

16 Jan 2022 (modified: 05 May 2023)ACL ARR 2022 January Blind SubmissionReaders: Everyone
Abstract: Significance testing has played a vital role in the development of NLP systems, providing confidence that one system is indeed better than another one. However, many significance tests involve hard computation problems, and so we rely on approximation methods such as Monte Carlo sampling. In this paper, we provide an exact dynamic programming algorithm that runs in quadratic time in the size of the dataset and performs the paired permutation test, a widely used test in comparing two systems, for the case of comparing accuracies between two classification systems. We show that Monte Carlo approximations are often too noisy to reliably determine whether we can reject the null hypothesis. We show that Monte Carlo approximations are often too noisy to reliably determine whether we can reject the null hypothesis with a significance level of $\threshold\approx 0.05$ for any number of sentence $N$. Additionally, we show that our exact algorithm is more efficient than the approximation algorithm for $N\le 10K$.
Paper Type: short
0 Replies

Loading