A Large-Scale Query Spelling Correction CorpusOpen Website

2017 (modified: 12 Nov 2022)SIGIR 2017Readers: Everyone
Abstract: We present a new large-scale collection of 54,772 queries with manually annotated spelling corrections. For 9,170 of the queries (16.74%), spelling variants that are different to the original query are proposed. With its size, our new corpus is an order of magnitude larger than other publicly available query spelling corpora. In addition to releasing the new large-scale corpus, we also provide an implementation of the winner of the Microsoft Speller Challenge from~2011 and compare it on the different publicly available corpora to spelling corrections mined from Google and Bing. This way, we also shed some light on the spelling correction performance of state-of-the-art commercial search systems.
0 Replies

Loading