Building Comparable Corpora Based on Bilingual LDA Model

Zede Zhu, Miao Li, Lei Chen, Zhenxin Yang

2013 (modified: 22 Jul 2025)ACL (2) 2013Readers: Everyone

Abstract: Comparable corpora are important basic resources in cross-language information processing. However, the existing methods of building comparable corpora, which use intertranslate words and relative features, cannot evaluate the topical relation between document pairs. This paper adopts the bilingual LDA model to predict the topical structures of the documents and proposes three algorithms of document similarity in different languages. Experiments show that the novel method can obtain similar documents with consistent topics own better adaptability and stability performance.

0 Replies