Sketch Algorithms for Estimating Point Queries in NLP

Amit Goyal, Hal Daumé III, Graham Cormode

2012 (modified: 10 Nov 2022)EMNLP-CoNLL 2012Readers: Everyone

Abstract: Many NLP tasks rely on accurate statistics from large corpora. Tracking complete statistics is memory intensive, so recent work has proposed using compact approximate "sketches" of frequency distributions. We describe 10 sketch methods, including existing and novel variants. We compare and study the errors (over-estimation and underestimation) made by the sketches. We evaluate several sketches on three important NLP problems. Our experiments show that one sketch performs best for all the three tasks.

0 Replies