New Areas of Application of Comparable Corpora

Reinhard Rapp, Vivian Xu, Michael Zock, Serge Sharoff, Richard S. Forsyth, Bogdan Babych, Chenhui Chu, Toshiaki Nakazawa, Sadao Kurohashi

2019 (modified: 17 Nov 2021)Using Comparable Corpora for Under-Resourced Areas of Machine Translation 2019Readers: Everyone

Abstract: This chapter describes several approaches of using comparable corpora beyond the area of MT for under-resourced languages, which is the primary focus of the ACCURAT project. Section 7.1, which is based on Rapp and Zock (Automatic dictionary expansion using non-parallel corpora. In: A. Fink, B. Lausen, W. Seidel, & A. Ultsch (Eds.) Advances in Data Analysis, Data Handling and Business Intelligence. Proceedings of the 32nd Annual Meeting of the GfKl, 2008. Springer, Heidelberg, 2010), addresses the task of creating resources for bilingual dictionaries using a seed lexicon; Sect. 7.2 (based on Rapp et al., Identifying word translations from comparable documents without a seed lexicon. Proceedings of LREC 2012, Istanbul, 2012) develops and evaluates a novel methodology of creating bilingual dictionaries without an initial lexicon. Section 7.3 proposes a novel system that can extract Chinese–Japanese parallel sentences from quasi-comparable and comparable corpora.

0 Replies