Cross-language retrieval using link-based language models

Benjamin Roth, Dietrich Klakow

2010 (modified: 12 Nov 2022)SIGIR 2010Readers: Everyone

Abstract: We propose a cross-language retrieval model that is solely based on Wikipedia as a training corpus. The main contributions of our work are: 1. A translation model based on linked text in Wikipedia and a term weighting method associated with it. 2. A combination scheme to interpolate the link translation model with retrieval based on Latent Dirichlet Allocation. On the CLEF 2000 data we achieve improvement with respect to the best German-English system at the bilingual track (non-significant) and improvement against a baseline based on machine translation (significant).

0 Replies