Randomized Language Models via Perfect Hash Functions

David Talbot, Thorsten Brants

2008 (modified: 13 Nov 2022)ACL 2008Readers: Everyone

Abstract: We propose a succinct randomized language model which employs a perfect hash function to encode fingerprints of n-grams and their associated probabilities, backoff weights, or other parameters. The scheme can represent any standard n-gram model and is easily combined with existing model reduction techniques such as entropy-pruning. We demonstrate the space-savings of the scheme via machine translation experiments within a distributed language modeling framework.

0 Replies