# EIT: Enhanced Interactive Transformer for Sequence Generation
This code is based on Fairseq v0.10.2
## Requirements and Installation
- PyTorch version >= 1.5.0
- python version >= 3.6  

## Prepare Data
### For Machine Translation 

#### 1、Download [WMT14' En-De](https://drive.google.com/uc?export=download&id=0B_bZck-ksdkpM25jRUN2X2UxMm8) and [WMT16' En-Ro](https://drive.google.com/uc?id=1YrAwCEuktG-iDVxtEW-FE72uFTLc5QMl)

#### 2、Preprocessed dataset

### For Abstractive Summarization Task

#### 1、Download [CNN dataset](https://drive.google.com/uc?export=download&id=0BwmD_VLjROrfTHk4NFg2SndKcjQ) and [Daily Mail dataset](https://drive.google.com/uc?export=download&id=0BwmD_VLjROrfM1BxdkxVaTY2bWs)


#### 2、Generate binary dataset ```data-bin/cnndm```

```bash preprocess_cnndaily_bin.sh path/to/cnndm_raw_data```

### For Grammatical Error Correction Task  

  #### 1、Download [FCE v2.1 dataset](https://www.cl.cam.ac.uk/research/nl/bea2019st/data/fce_v2.1.bea19.tar.gz)、[Lang-8 Corpus of Learner English dataset](https://docs.google.com/forms/d/e/1FAIpQLSflRX3h5QYxegivjHN7SJ194OxZ4XN_7Rt0cNpR2YbmNV-7Ag/viewform)、[NUCLE dataset](https://sterling8.d2.comp.nus.edu.sg/nucle_download/nucle.php)、[W&I+LOCNESS v2.1 dataset](https://www.cl.cam.ac.uk/research/nl/bea2019st/data/wi+locness_v2.1.bea19.tar.gz)

  #### 2、Get CONLL14 test set  

  ```bash prepare_conll14_test_data.sh```

  #### 3、Preprocessed dataset  

  ```bash preprocess_gec.sh```

  #### 4、Generate binary dataset  ```data-bin/BEA```

  ```bash preprocess_gec_bin.sh```

## Train
### For WMT'14 En-De Task  

#### Train a EIT  model (48-layer deep model)

```bash train_wmt_en_de_eit_48.sh```

```
#! /u/bin/bash
set -e

device=0,1,2,3,4,5,6,7
task=wmt-en2de
# must set this tag
tag=pre_48_eit


if [ $task == "wmt-en2de" ]; then
        arch=enhanced_interactive_transformer_t2t_wmt_en_de_48l
        share_embedding=1
        share_decoder_input_output_embed=0
        criterion=label_smoothed_cross_entropy
        fp16=1
        lr=0.002
        warmup=16000
        max_tokens=4096
        update_freq=2
        weight_decay=0.0
        keep_last_epochs=10
        max_epoch=21
        max_update=
        reset_optimizer=0
        data_dir=google
        src_lang=en
        tgt_lang=de
else
        echo "unknown task=$task"
        exit
fi

save_dir=checkpoints/$task/$tag

if [ ! -d $save_dir ]; then
        mkdir -p $save_dir
fi
cp ${BASH_SOURCE[0]} $save_dir/train.sh

gpu_num=`echo "$device" | awk '{split($0,arr,",");print length(arr)}'`

cmd="python3 -u train.py data-bin/$data_dir
  --distributed-world-size $gpu_num -s $src_lang -t $tgt_lang
  --arch $arch
  --optimizer adam --clip-norm 0.0
  --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates $warmup
  --lr $lr --min-lr 1e-09
  --weight-decay $weight_decay
  --criterion $criterion --label-smoothing 0.1
  --max-tokens $max_tokens
  --update-freq $update_freq
  --no-progress-bar
  --log-interval 100
  --ddp-backend no_c10d 
  --seed 1
  --task translation
  --save-dir $save_dir
  --keep-last-epochs $keep_last_epochs
  --tensorboard-logdir $save_dir"


adam_betas="'(0.9, 0.997)'"
cmd=${cmd}" --adam-betas "${adam_betas}
if [ $share_embedding -eq 1 ]; then
cmd=${cmd}" --share-all-embeddings "
fi
if [ $share_decoder_input_output_embed -eq 1 ]; then
cmd=${cmd}" --share-decoder-input-output-embed "
fi
if [ -n "$max_epoch" ]; then
cmd=${cmd}" --max-epoch "${max_epoch}
fi
if [ -n "$max_update" ]; then
cmd=${cmd}" --max-update "${max_update}
fi
if [ -n "$dropout" ]; then
cmd=${cmd}" --dropout "${dropout}
fi
if [ $fp16 -eq 1 ]; then
cmd=${cmd}" --fp16 "
fi
if [ $reset_optimizer -eq 1 ]; then
cmd=${cmd}" --reset-optimizer "
fi

export CUDA_VISIBLE_DEVICES=$device
cmd="nohup "${cmd}" > $save_dir/train.log 2>&1 &"
eval $cmd
tail -f $save_dir/train.log
```


#### Train a E-EIT  model (48-layer deep model)

```bash train_wmt_en_de_e_eit_48.sh```

~~~
#! /u/bin/bash
set -e

device=0,1,2,3,4,5,6,7
task=wmt-en2de
# must set this tag
tag=pre_48_e_eit


if [ $task == "wmt-en2de" ]; then
        arch=efficient_enhanced_interactive_transformer_t2t_wmt_en_de_48l
        share_embedding=1
        share_decoder_input_output_embed=0
        criterion=label_smoothed_cross_entropy
        fp16=1
        lr=0.002
        warmup=16000
        max_tokens=4096
        update_freq=2
        weight_decay=0.0
        keep_last_epochs=10
        max_epoch=21
        max_update=
        reset_optimizer=0
        data_dir=google
        src_lang=en
        tgt_lang=de
else
        echo "unknown task=$task"
        exit
fi

save_dir=checkpoints/$task/$tag

if [ ! -d $save_dir ]; then
        mkdir -p $save_dir
fi
cp ${BASH_SOURCE[0]} $save_dir/train.sh

gpu_num=`echo "$device" | awk '{split($0,arr,",");print length(arr)}'`

cmd="python3 -u train.py data-bin/$data_dir
  --distributed-world-size $gpu_num -s $src_lang -t $tgt_lang
  --arch $arch
  --optimizer adam --clip-norm 0.0
  --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates $warmup
  --lr $lr --min-lr 1e-09
  --weight-decay $weight_decay
  --criterion $criterion --label-smoothing 0.1
  --max-tokens $max_tokens
  --update-freq $update_freq
  --no-progress-bar
  --log-interval 100
  --ddp-backend no_c10d 
  --seed 1
  --task translation
  --save-dir $save_dir
  --keep-last-epochs $keep_last_epochs
  --tensorboard-logdir $save_dir"


adam_betas="'(0.9, 0.997)'"
cmd=${cmd}" --adam-betas "${adam_betas}
if [ $share_embedding -eq 1 ]; then
cmd=${cmd}" --share-all-embeddings "
fi
if [ $share_decoder_input_output_embed -eq 1 ]; then
cmd=${cmd}" --share-decoder-input-output-embed "
fi
if [ -n "$max_epoch" ]; then
cmd=${cmd}" --max-epoch "${max_epoch}
fi
if [ -n "$max_update" ]; then
cmd=${cmd}" --max-update "${max_update}
fi
if [ -n "$dropout" ]; then
cmd=${cmd}" --dropout "${dropout}
fi
if [ $fp16 -eq 1 ]; then
cmd=${cmd}" --fp16 "
fi
if [ $reset_optimizer -eq 1 ]; then
cmd=${cmd}" --reset-optimizer "
fi

export CUDA_VISIBLE_DEVICES=$device
cmd="nohup "${cmd}" > $save_dir/train.log 2>&1 &"
eval $cmd
tail -f $save_dir/train.log


~~~



### For WMT'14 En-Ro Task

#### Train a EIT model (6-layer base model)

```bash train_wmt_en_ro_eit_6.sh```

```
#! /u/bin/bash
set -e

device=0,1,2,3
task=wmt-en2ro-final
# must set this tag
tag=pre_6_eit

if [ $task == "wmt-en2ro-final" ]; then
        arch=enhanced_interactive_transformer_t2t_wmt_en_ro_6l
        share_embedding=1
        share_decoder_input_output_embed=0
        criterion=label_smoothed_cross_entropy
        fp16=1
        lr=0.002
        warmup=8000
        max_tokens=4096
        update_freq=1
        weight_decay=0
        keep_last_epochs=15
        max_epoch=21
        max_update=
        reset_optimizer=0
        data_dir=wmt16.en-ro.bpe20k
        src_lang=en
        tgt_lang=ro
else
        echo "unknown task=$task"
        exit
fi

save_dir=checkpoints/$task/$tag

if [ ! -d $save_dir ]; then
        mkdir -p $save_dir
fi
cp ${BASH_SOURCE[0]} $save_dir/train.sh

gpu_num=`echo "$device" | awk '{split($0,arr,",");print length(arr)}'`

cmd="python3 -u train.py data-bin/$data_dir
  --distributed-world-size $gpu_num -s $src_lang -t $tgt_lang
  --arch $arch
  --optimizer adam --clip-norm 0.0
  --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates $warmup
  --lr $lr --min-lr 1e-09
  --weight-decay $weight_decay
  --criterion $criterion --label-smoothing 0.1
  --max-tokens $max_tokens
  --update-freq $update_freq
  --no-progress-bar
  --log-interval 100
  --ddp-backend no_c10d 
  --seed 1
  --task translation
  --encoder-layers 6
  --save-dir $save_dir
  --keep-last-epochs $keep_last_epochs
  --tensorboard-logdir $save_dir"


adam_betas="'(0.9, 0.997)'"
cmd=${cmd}" --adam-betas "${adam_betas}
if [ $share_embedding -eq 1 ]; then
cmd=${cmd}" --share-all-embeddings "
fi
if [ $share_decoder_input_output_embed -eq 1 ]; then
cmd=${cmd}" --share-decoder-input-output-embed "
fi
if [ -n "$max_epoch" ]; then
cmd=${cmd}" --max-epoch "${max_epoch}
fi
if [ -n "$max_update" ]; then
cmd=${cmd}" --max-update "${max_update}
fi
if [ -n "$dropout" ]; then
cmd=${cmd}" --dropout "${dropout}
fi
if [ $fp16 -eq 1 ]; then
cmd=${cmd}" --fp16 "
fi
if [ $reset_optimizer -eq 1 ]; then
cmd=${cmd}" --reset-optimizer "
fi


export CUDA_VISIBLE_DEVICES=$device
cmd="nohup "${cmd}" > $save_dir/train.log 2>&1 &"
eval $cmd
tail -f $save_dir/train.log
```


#### Train a E-EIT model (6-layer base model)

```bash train_wmt_en_ro_e_eit_6.sh```

~~~
#! /u/bin/bash
set -e

device=0,1,2,3
task=wmt-en2ro-final
# must set this tag
tag=pre_6_e_eit

if [ $task == "wmt-en2ro-final" ]; then
        arch=efficient_enhanced_interactive_transformer_t2t_wmt_en_ro_6l
        share_embedding=1
        share_decoder_input_output_embed=0
        criterion=label_smoothed_cross_entropy
        fp16=1
        lr=0.002
        warmup=8000
        max_tokens=4096
        update_freq=1
        weight_decay=0
        keep_last_epochs=15
        max_epoch=21
        max_update=
        reset_optimizer=0
        data_dir=wmt16.en-ro.bpe20k
        src_lang=en
        tgt_lang=ro
else
        echo "unknown task=$task"
        exit
fi

save_dir=checkpoints/$task/$tag

if [ ! -d $save_dir ]; then
        mkdir -p $save_dir
fi
cp ${BASH_SOURCE[0]} $save_dir/train.sh

gpu_num=`echo "$device" | awk '{split($0,arr,",");print length(arr)}'`

cmd="python3 -u train.py data-bin/$data_dir
  --distributed-world-size $gpu_num -s $src_lang -t $tgt_lang
  --arch $arch
  --optimizer adam --clip-norm 0.0
  --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates $warmup
  --lr $lr --min-lr 1e-09
  --weight-decay $weight_decay
  --criterion $criterion --label-smoothing 0.1
  --max-tokens $max_tokens
  --update-freq $update_freq
  --no-progress-bar
  --log-interval 100
  --ddp-backend no_c10d 
  --seed 1
  --task translation
  --encoder-layers 6
  --save-dir $save_dir
  --keep-last-epochs $keep_last_epochs
  --tensorboard-logdir $save_dir"


adam_betas="'(0.9, 0.997)'"
cmd=${cmd}" --adam-betas "${adam_betas}
if [ $share_embedding -eq 1 ]; then
cmd=${cmd}" --share-all-embeddings "
fi
if [ $share_decoder_input_output_embed -eq 1 ]; then
cmd=${cmd}" --share-decoder-input-output-embed "
fi
if [ -n "$max_epoch" ]; then
cmd=${cmd}" --max-epoch "${max_epoch}
fi
if [ -n "$max_update" ]; then
cmd=${cmd}" --max-update "${max_update}
fi
if [ -n "$dropout" ]; then
cmd=${cmd}" --dropout "${dropout}
fi
if [ $fp16 -eq 1 ]; then
cmd=${cmd}" --fp16 "
fi
if [ $reset_optimizer -eq 1 ]; then
cmd=${cmd}" --reset-optimizer "
fi


export CUDA_VISIBLE_DEVICES=$device
cmd="nohup "${cmd}" > $save_dir/train.log 2>&1 &"
eval $cmd
tail -f $save_dir/train.log
~~~



### For Abstractive Summarization Task  

#### Train a EIT model (6-layer base model)

```bash train_cnndm_eit_6.sh```

```
#! /u/bin/bash
set -e

device=0,1,2,3,4,5,6,7
task=cnndm-32k
# must set this tag
tag=pre_6_eit
if [ $task == "cnndm-32k" ]; then
        arch=enhanced_interactive_transformer_t2t_cnndm_6l
        share_embedding=1
        share_decoder_input_output_embed=0
        criterion=label_smoothed_cross_entropy
        fp16=1
        lr=0.002
        warmup=8000
        max_tokens=8192
        update_freq=2
        weight_decay=0.0001
        keep_last_epochs=15
        max_epoch=30
        max_update=50000
        reset_optimizer=0
        data_dir=
        src_lang=source
        tgt_lang=target
        truncate_source=1
else
        echo "unknown task=$task"
        exit
fi

save_dir=checkpoints/$task/$tag

if [ ! -d $save_dir ]; then
        mkdir -p $save_dir
fi
cp ${BASH_SOURCE[0]} $save_dir/train.sh

gpu_num=`echo "$device" | awk '{split($0,arr,",");print length(arr)}'`

cmd="python3 -u train.py data-bin/$data_dir
  --distributed-world-size $gpu_num -s $src_lang -t $tgt_lang
  --arch $arch
  --optimizer adam --clip-norm 0.0
  --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates $warmup
  --lr $lr --min-lr 1e-09
  --weight-decay $weight_decay
  --criterion $criterion --label-smoothing 0.1
  --max-tokens $max_tokens
  --update-freq $update_freq
  --no-progress-bar
  --log-interval 100
  --ddp-backend no_c10d 
  --seed 1
  --save-dir $save_dir
  --keep-last-epochs $keep_last_epochs
  --tensorboard-logdir $save_dir" 
adam_betas="'(0.9, 0.997)'"
cmd=${cmd}" --adam-betas "${adam_betas}
if [ $share_embedding -eq 1 ]; then
cmd=${cmd}" --share-all-embeddings "
fi
if [ $truncate_source -eq 1 ]; then
cmd=${cmd}" --truncate-source --skip-invalid-size-inputs-valid-test --max-source-positions 500 "
fi

if [ $share_decoder_input_output_embed -eq 1 ]; then
cmd=${cmd}" --share-decoder-input-output-embed "
fi
if [ -n "$max_epoch" ]; then
cmd=${cmd}" --max-epoch "${max_epoch}
fi
if [ -n "$max_update" ]; then
cmd=${cmd}" --max-update "${max_update}
fi
if [ -n "$dropout" ]; then
cmd=${cmd}" --dropout "${dropout}
fi
if [ $fp16 -eq 1 ]; then
cmd=${cmd}" --fp16 "
fi
if [ $reset_optimizer -eq 1 ]; then
cmd=${cmd}" --reset-optimizer "
fi


export CUDA_VISIBLE_DEVICES=$device
cmd="nohup "${cmd}" > $save_dir/train.log 2>&1 &"
eval $cmd
tail -f $save_dir/train.log
```



#### Train a E-EIT model (6-layer base model)

```bash train_cnndm_e_eit_6.sh```

~~~
#! /u/bin/bash
set -e

device=0,1,2,3,4,5,6,7
task=cnndm-32k
# must set this tag
tag=pre_6_e_eit
if [ $task == "cnndm-32k" ]; then
        arch=efficient_enhanced_interactive_transformer_t2t_cnndm_6l
        share_embedding=1
        share_decoder_input_output_embed=0
        criterion=label_smoothed_cross_entropy
        fp16=1
        lr=0.002
        warmup=8000
        max_tokens=8192
        update_freq=2
        weight_decay=0.0001
        keep_last_epochs=15
        max_epoch=30
        max_update=50000
        reset_optimizer=0
        data_dir=
        src_lang=source
        tgt_lang=target
        truncate_source=1
else
        echo "unknown task=$task"
        exit
fi

save_dir=checkpoints/$task/$tag

if [ ! -d $save_dir ]; then
        mkdir -p $save_dir
fi
cp ${BASH_SOURCE[0]} $save_dir/train.sh

gpu_num=`echo "$device" | awk '{split($0,arr,",");print length(arr)}'`

cmd="python3 -u train.py data-bin/$data_dir
  --distributed-world-size $gpu_num -s $src_lang -t $tgt_lang
  --arch $arch
  --optimizer adam --clip-norm 0.0
  --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates $warmup
  --lr $lr --min-lr 1e-09
  --weight-decay $weight_decay
  --criterion $criterion --label-smoothing 0.1
  --max-tokens $max_tokens
  --update-freq $update_freq
  --no-progress-bar
  --log-interval 100
  --ddp-backend no_c10d 
  --seed 1
  --save-dir $save_dir
  --keep-last-epochs $keep_last_epochs
  --tensorboard-logdir $save_dir" 
adam_betas="'(0.9, 0.997)'"
cmd=${cmd}" --adam-betas "${adam_betas}
if [ $share_embedding -eq 1 ]; then
cmd=${cmd}" --share-all-embeddings "
fi
if [ $truncate_source -eq 1 ]; then
cmd=${cmd}" --truncate-source --skip-invalid-size-inputs-valid-test --max-source-positions 500 "
fi

if [ $share_decoder_input_output_embed -eq 1 ]; then
cmd=${cmd}" --share-decoder-input-output-embed "
fi
if [ -n "$max_epoch" ]; then
cmd=${cmd}" --max-epoch "${max_epoch}
fi
if [ -n "$max_update" ]; then
cmd=${cmd}" --max-update "${max_update}
fi
if [ -n "$dropout" ]; then
cmd=${cmd}" --dropout "${dropout}
fi
if [ $fp16 -eq 1 ]; then
cmd=${cmd}" --fp16 "
fi
if [ $reset_optimizer -eq 1 ]; then
cmd=${cmd}" --reset-optimizer "
fi


export CUDA_VISIBLE_DEVICES=$device
cmd="nohup "${cmd}" > $save_dir/train.log 2>&1 &"
eval $cmd
tail -f $save_dir/train.log

~~~


### For Grammatical Error Correction Task  

#### Train a EIT model 
```bash train_gec_eit_6.sh```

```
#! /usr/bin/bash
set -e
device=0,1,2,3,4,5,6,7
encoder_layers=6
task=gec_conll14
# must set this tag
tag=post_6_eit
if [ $task == "gec_conll14" ]; then
        arch=enhanced_interactive_transformer_gec_6l
        share_embedding=1
        share_decoder_input_output_embed=0
        criterion=label_smoothed_cross_entropy
        fp16=1
        lr=0.0015
        warmup=4000
        max_tokens=4096
        update_freq=2
        weight_decay=0.0001
        keep_last_epochs=40
        max_epoch=55
        max_update=
        data_dir=BEA
        src_lang=src
        tgt_lang=trg
else
        echo "unknown task=$task"
        exit
fi

save_dir=checkpoints/$task/$tag

if [ ! -d $save_dir ]; then
        mkdir -p $save_dir
fi
cp ${BASH_SOURCE[0]} $save_dir/train.sh

gpu_num=`echo "$device" | awk '{split($0,arr,",");print length(arr)}'`

cmd="python3 -u train.py data-bin/$data_dir
  --distributed-world-size $gpu_num -s $src_lang -t $tgt_lang
  --arch $arch
  --optimizer adam --clip-norm 0.0
  --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates $warmup
  --lr $lr --min-lr 1e-09
  --weight-decay $weight_decay
  --encoder-layers $encoder_layers
  --criterion $criterion  --label-smoothing 0.1
  --max-tokens $max_tokens
  --update-freq $update_freq
  --no-progress-bar
  --log-interval 100
  --ddp-backend no_c10d
  --seed 1
  --save-dir $save_dir
  --keep-last-epochs $keep_last_epochs
  --tensorboard-logdir $save_dir"

adam_betas="'(0.9, 0.98)'"
cmd=${cmd}" --adam-betas "${adam_betas}
if [ $share_embedding -eq 1 ]; then
cmd=${cmd}" --share-all-embeddings "
fi
if [ $share_decoder_input_output_embed -eq 1 ]; then
cmd=${cmd}" --share-decoder-input-output-embed "
fi
if [ -n "$max_epoch" ]; then
cmd=${cmd}" --max-epoch "${max_epoch}
fi
if [ -n "$max_update" ]; then
cmd=${cmd}" --max-update "${max_update}
fi
if [ -n "$dropout" ]; then
cmd=${cmd}" --dropout "${dropout}
fi
if [ $fp16 -eq 1 ]; then
cmd=${cmd}" --fp16 "
fi


export CUDA_VISIBLE_DEVICES=$device
cmd="nohup "${cmd}" > $save_dir/train.log 2>&1 &"
eval $cmd
tail -f $save_dir/train.log
```


#### Train a E-EIT model 
```bash train_gec_e_eit_6.sh```

~~~
#! /usr/bin/bash
set -e


device=0,1,2,3,4,5,6,7
encoder_layers=6
task=gec_conll14

# must set this tag
tag=post_6_e_eit
if [ $task == "gec_conll14" ]; then
        arch=efficient_enhanced_interactive_transformer_gec_6l
        share_embedding=1
        share_decoder_input_output_embed=0
        criterion=label_smoothed_cross_entropy
        fp16=1
        lr=0.0015
        warmup=4000
        max_tokens=4096
        update_freq=2
        weight_decay=0.0001
        keep_last_epochs=40
        max_epoch=55
        max_update=
        data_dir=BEA
        src_lang=src
        tgt_lang=trg
else
        echo "unknown task=$task"
        exit
fi

save_dir=checkpoints/$task/$tag

if [ ! -d $save_dir ]; then
        mkdir -p $save_dir
fi
cp ${BASH_SOURCE[0]} $save_dir/train.sh

gpu_num=`echo "$device" | awk '{split($0,arr,",");print length(arr)}'`

cmd="python3 -u train.py data-bin/$data_dir
  --distributed-world-size $gpu_num -s $src_lang -t $tgt_lang
  --arch $arch
  --optimizer adam --clip-norm 0.0
  --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates $warmup
  --lr $lr --min-lr 1e-09
  --weight-decay $weight_decay
  --encoder-layers $encoder_layers
  --criterion $criterion  --label-smoothing 0.1
  --max-tokens $max_tokens
  --update-freq $update_freq
  --no-progress-bar
  --log-interval 100
  --ddp-backend no_c10d
  --seed 1
  --save-dir $save_dir
  --keep-last-epochs $keep_last_epochs
  --tensorboard-logdir $save_dir"

adam_betas="'(0.9, 0.98)'"
cmd=${cmd}" --adam-betas "${adam_betas}
if [ $share_embedding -eq 1 ]; then
cmd=${cmd}" --share-all-embeddings "
fi
if [ $share_decoder_input_output_embed -eq 1 ]; then
cmd=${cmd}" --share-decoder-input-output-embed "
fi
if [ -n "$max_epoch" ]; then
cmd=${cmd}" --max-epoch "${max_epoch}
fi
if [ -n "$max_update" ]; then
cmd=${cmd}" --max-update "${max_update}
fi
if [ -n "$dropout" ]; then
cmd=${cmd}" --dropout "${dropout}
fi
if [ $fp16 -eq 1 ]; then
cmd=${cmd}" --fp16 "
fi


export CUDA_VISIBLE_DEVICES=$device
cmd="nohup "${cmd}" > $save_dir/train.log 2>&1 &"
eval $cmd
tail -f $save_dir/train.log
~~~


## Evaluation
### For WMT'14 En-De Task

We measure the performance through multi-bleu and sacrebleu

~~~
python3 generate.py \
data-bin/wmt-en2de \
--path $model_dir/$checkpoint \
--gen-subset test \
--batch-size your_batch \
--beam 4 \
--lenpen 0.6 \
--output hypo.txt \
--quiet \
--remove-bpe
~~~

### For Abstractive Summarization Task

We use pyrouge as the scoring script. 

~~~
python3 generate.py \
data-bin/$data_dir \
--path $model_dir/$checkpoint \
--gen-subset test \
--truncate-source \
--batch-size your_batch \
--lenpen 2.0 \
--min-len 55 \
--max-len-b 140 \
--max-source-positions 500 \
--beam 4 \
--no-repeat-ngram-size 3 \
--remove-bpe

python3 get_rouge.py --decodes_filename cnndm.test.target.tok --targets_filename $model_dir/hypo.sorted.tok
~~~

### For Grammatical Error Correction Task
We use m2scorer as the scoring script. 

~~~
python3 generate.py \
data-bin/$data_dir \
--path $model_dir/$checkpoint \
--gen-subset test \
--batch-size your_batch \
--beam 4 \
--lenpen 2.0 \
--output hypo.txt \
--quiet \
--remove-bpe

path/to/m2scorer path/to/model_output path/to/conll14st-test.m2
~~~

## Results
### Machine Translation

| Model                            | Layer | En-De | Layer | En-Ro |   
| -------------------------------- | ----- | ----- | ----- | ----- |
| Transformer                      | 48-6  | 29.60 | 6-6   | 34.23 |
| EIT                              | 48-6  | 30.25 | 6-6   | 35.10 |
| E-EIT                            | 48-6  | 30.16 | 6-6   | 35.01 |


 

### Abstractive Summarization Task

| Model                             | RG-1 | RG-2 | RG-L |
| --------------------------------- | ---- | ---- | ---- |
| Transformer                       | 40.84 | 18.00 | 37.58 |
| EIT                               | 41.62 | 18.70 | 38.33 |
| E-EIT                             | 41.58 | 18.63 | 38.28 |

### Grammatical Error Correction Task

|   Model  |  Prec.  |  Recall | F_0.5 |
|  ----    |  ----   | ----    | ----  |
| Transformer | 64.84   | 36.61 |56.18  |
| EIT      | 69.98   | 32.80   |57.05  |
| E-EIT    | 69.85   | 33.36   |57.31  |

