# **A Text GAN for Language Generation with Non-Autoregressive Generator**

## Requirements

* ``Python==3.7`` is required.
* Use the following command to install requirements.
```
pip install -r requirements.txt
```
*  We recommend a GPU version of PyTorch and TensorFlow for speeding up training and evaluation.

## Training

### Train on the synthetic data

```
python run.py --name synthetic --gp 0.1 --dr 0.1 --batch_size 128 --epoch 50 --datapath ./synthetic_data --wvpath None
```

### Train on the real data

* For COCO dataset

```
python run.py --name mscoco
```

* For SNLI dataset

```
python run.py --name snli --datapath ./snli_data
```

The training scripts will automatically download [pretrained wordvec](http://nlp.stanford.edu/data/glove.6B.zip). You can disable the pretrained word vectors by adding `--wvpath None`

It also will automatically downloads [Universal Sentence Encoder](https://tfhub.dev/google/universal-sentence-encoder-large/3) for FED evaluation.

You can use Tensorboard to demonstrate  the training curves.

```
tensorboard --logdir=./tensorboard
```

## Evaluation

### Evaluation on the synthetic data

* Generate the samples from the model

```
python run.py --name synthetic --datapath ./synthetic_data --wvpath None --restore synthetic_last --mode test --droprate [DROPRATE]
```

`[DROPRATE]` can be a float for droprate in inference, for example, ``0.25``.

It will generate a txt file at `./output/synthetic.txt`

* Evaluate the generated samples

```
python evaluate_synthetic.py synthetic
```

The output is the OracleNLL of the generated sample.

### Evaluation on the real data

* Generate the samples from the model

```
python run.py --name mscoco --restore mscoco_best --mode test --droprate [DROPRATE]

python run.py --name snli --restore snli_best --datapath ./snli_data --mode test --droprate [DROPRATE]
```

`[DROPRATE]` can be a float for droprate in inference, for example,`0.25`.

It will generate a txt file at `./output/mscoco.txt` or `./output/snli.txt`

* Evaluate the generated samples

```
python evaluate_real.py mscoco

python evaluate_snli.py snli
```
It will write to both the screen and a file at `./output/mscoco.res` or `./output/snli.res`.

* The output will be like

```
{
	'FED': 0.13046254, 
	'fw-bleu': 0.28178000203224546, 
	'bw-bleu': 0.23913991694210726, 
	'fw-bw-bleu': 0.25871479982801604, 
	'fw-bw-bleu hashvalue': '4b22cf4887b81f7987fadec37f207fa72a168cd64d5b3f375ae8adfa844a4d33',
	'fwppl': 50.174534845570065, 
	'bwppl': 95.30268546437244, 
	'fwppl hashvalue': '044675fb77e831c9db21888b2f9e61c4b02228f11cfa03bf04ec5c7d526e36c3',
	'bwppl hashvalue': '044675fb77e831c9db21888b2f9e61c4b02228f11cfa03bf04ec5c7d526e36c3'
}
```

The LM score reported in papers is calculated by `log(fwppl)`. Hash value is a check code to ensure the test data is the same.

### Sentence Manipulation

* Run the following command to generate a pool of sentences from random sampled Z, stored at `./output/record_mscoco.pkl`.

```
python run.py --name mscoco --restore mscoco_best --mode record
```

* Then you can use the following commands to do sentence manipulation.

* Sentence Editing (Offset Vector)

```
python run.py --name mscoco --restore mscoco_best --mode adddiff
```

* Sentence Editing (Gradient Descent)

```
python run.py --name mscoco --restore mscoco_best --mode walkgrad
```

## Results

The results of COCO with mean and standard deviation over 3 models with different training seeds:

| Model name         | LM Score | Forward BLEU | Backward BLEU | Harmony BLEU | FED |
| ------------------ |---------------- | -------------- | ------------------ | ------------------ | ------------------ |
| NAGAN(droprate=0.25) |     $3.69\pm0.05$     | $0.315\pm0.007$ | $0.257\pm0.013$ | $0.283\pm0.011$ | $0.108\pm0.005$ |
| NAGAN(droprate=0.2) | $3.54\pm0.05$ | $0.342\pm0.008$ | $0.256\pm0.011$ | $0.293\pm0.010$ | $0.111\pm0.005$ |
| NAGAN(droprate=0.15) | $3.41\pm0.07$ | $0.371\pm0.018$ | $0.252\pm0.014$ | $0.301\pm0.015$ | $0.118\pm0.005$ |
| NAGAN(droprate=0.1) | $3.30\pm0.07$ | $0.391\pm0.023$ | $0.245\pm0.017$ | $0.301\pm0.020$ | $0.128\pm0.011$ |

The results of SNLI with mean and standard deviation over 3 models with different training seeds:

| Model name         | LM Score | Forward BLEU | Backward BLEU | Harmony BLEU | FED |
| ------------------ |---------------- | -------------- | ------------------ | ------------------ | ------------------ |
| NAGAN(droprate=0.25) |     $3.96\pm0.15$     | $0.279\pm0.018$ | $0.205\pm0.003$ | $0.236\pm0.004$ | $0.059\pm0.011$ |
| NAGAN(droprate=0.2) | $3.78\pm0.15$ | $0.310\pm0.021$ | $0.205\pm0.004$ | $0.246\pm0.004$ | $0.067\pm0.014$ |
| NAGAN(droprate=0.15) | $3.61\pm0.15$ | $0.338\pm0.019$ | $0.203\pm0.006$ | $0.253\pm0.001$ | $0.078\pm0.019$ |
| NAGAN(droprate=0.1) | $3.48\pm0.13$ | $0.363\pm0.019$ | $0.196\pm0.009$ | $0.254\pm0.003$ | $0.096\pm0.027$ |

