Abstract: We present a new large-scale collection of 54,772 queries with manually annotated spelling corrections. For 9,170 of the queries (16.74%), spelling variants that are different to the original query are proposed. With its size, our new corpus is an order of magnitude larger than other publicly available query spelling corpora. In addition to releasing the new large-scale corpus, we also provide an implementation of the winner of the Microsoft Speller Challenge from~2011 and compare it on the different publicly available corpora to spelling corrections mined from Google and Bing. This way, we also shed some light on the spelling correction performance of state-of-the-art commercial search systems.
0 Replies
Loading