Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Zero-shot Cross Language Text Classification
Dan Svenstrup, Jonas Meinertz Hansen, Ole Winther
Feb 15, 2018 (modified: Feb 15, 2018)ICLR 2018 Conference Blind Submissionreaders: everyoneShow Bibtex
Abstract:Labeled text classification datasets are typically only available in a few select languages. In order to train a model for e.g news categorization in a language $L_t$ without a suitable text classification dataset there are two options. The first option is to create a new labeled dataset by hand, and the second option is to transfer label information from an existing labeled dataset in a source language $L_s$ to the target language $L_t$. In this paper we propose a method for sharing label information across languages by means of a language independent text encoder. The encoder will give almost identical representations to multilingual versions of the same text. This means that labeled data in one language can be used to train a classifier that works for the rest of the languages. The encoder is trained independently of any concrete classification task and can therefore subsequently be used for any classification task. We show that it is possible to obtain good performance even in the case where only a comparable corpus of texts is available.
TL;DR:Cross Language Text Classification by universal encoding
Keywords:Cross Language Text Classification, Neural Networks, Machine Learning
Enter your feedback below and we'll get back to you as soon as possible.