Generative Adversarial Nets for Multiple Text Corpora

Diego Klabjan; Baiyang Wang

Generative Adversarial Nets for Multiple Text Corpora

Diego Klabjan, Baiyang Wang

25 Sept 2019 (modified: 22 Jun 2025)ICLR 2020 Conference Blind SubmissionReaders: Everyone

TL;DR: Constructing robust embeddings by means of GANs from multiple corpora

Abstract: Generative adversarial nets (GANs) have been successfully applied to the artificial generation of image data. In terms of text data, much has been done on the artificial generation of natural language from a single corpus. We consider multiple text corpora as the input data, for which there can be two applications of GANs: (1) the creation of consistent cross-corpus word embeddings given different word embeddings per corpus; (2) the generation of robust bag-of-words document embeddings for each corpora. We demonstrate our GAN models on real-world text data sets from different corpora, and show that embeddings from both models lead to improvements in supervised learning problems.

Keywords: GAN, NLP, embeddings

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/generative-adversarial-nets-for-multiple-text/code)

Original Pdf: pdf

4 Replies

Loading