Abstract: Recently a new language model, the random forest language model (RFLM), has been proposed and shown encouraging results in speech recognition tasks. In this paper we applied the RFLM to language identification tasks. We proposed a shared backoff smoothing to deal with data sparseness problem. Experiments were conducted on a subset of NIST 2003 language recognition evaluation data. The RFLM obtained 15.7% relative error rate reduction comparing with the standard trigram LM. The RFLM can be used as a counterpart to n-gram LM and BTLM for system fusion. We also empirically studied the relation between system performance and the tree numbers in a RFLM.
Loading