##instruction for running our code
### environment
- python 3.7+
- [PyTorch](https://pytorch.org/) 1.7+
- [Transformers](https://huggingface.co/docs/transformers/index) 4.10.0+
- [fairseq](https://github.com/facebookresearch/fairseq) 0.10.0+
- [fastNLP](https://github.com/fastnlp/fastNLP.git) 0.5.0+

##Usable datasets and running instruction
Due to the  size limitation, we only submit the IWSLT'14 De-En dataset (at `fairseq/data-bin/iwslt14.tokenized.de-en`)  and most datasets are provided via urls.

#####Reproducing IWSLT'14 De-En and WMT'14 En-De
You should first download the two datasets via the [instruction site](https://github.com/facebookresearch/fairseq/tree/main/examples/translation) of fairseq for reproducing neural machine translation resutls. 
The two scripts `run_iwslt14.py` and `run_wmt14.py` contain all required settings for CoNT and are the interface of the fairseq code base.
After downloading and preprocessing, you can run the following common to start the training phase:
```
cd fairseq
# reproduce iwslt14-de-en
python run_iwslt14.py --mode train_cl --gpu 0,1,2,3
# reproduce wmt14-en-de
python run_wmt14.py --mode train_cl --gpu 0,1,2,3
```
,where `train_cl` means using contrastive training during training.
After training, the following command is used to generate output for the test set,
The default checkpoint is the best checkpoint on validation set, and you can run the following common to get the generation results:
```
python run_iwslt14.py (run_wmt14.py)  --mode gen_cl --gpu 0 --ckpt_dir /path/to/checkpoints
```
,where `gen_cl` means using contrastive learning based inference algorithms.

To evalute the generation results with BLEU, please the following command:
```
python run_iwslt14.py (run_wmt14.py)  --mode score  --filename $file_name
```


#####Reproducing code comment generation
Downloading pretrained weights and preprocessed datasets:
```
# pip install gsutil
gsutil -m cp -r "gs://sfr-codet5-data-research/data" .
gsutil -m cp -r "gs://sfr-codet5-data-research/finetuned_models" 
``` 
Training command:
```
cd transformers
python run_java_python.py  --mode train --dataset java --baseline False --batch_size 32 --gpu 0,1,2,3
```
Generating the results for test set:
```
python run_java_python.py --mode test --save_path /path/to/checkpoints --dataset java --baseline False --batch_size 32 --gpu 0
```
Evaluating the BLEU score of system output:
```
python evaluation/code_comment/eval.py --sys_path $sys_file --ref_path $ref_file
```

#####Reproducing WMT'16 Ro-En
Downloading preprocessed datasets:

```
git clone https://github.com/rsennrich/wmt16-scripts
cd wmt16-scripts
cd sample
./download_files.sh
./preprocess.sh
``` 
Training command:
```
cd transformers
python run_wmtROEN.py  --mode train --model_name t5-samll  --baseline False --batch_size 32 --gpu 0,1,2,3
```
Generating the results for test set:
```
python run_java_python.py --mode test  /path/to/checkpoints --batch_size 32 --gpu 0
```
Evaluating the BLEU score of system output:
```
python evaluation/wmtROEN/eval.py --sys_path $sys_file --ref_path $ref_file 
```

#####Reproducing other benchmarks
Other datasets are available on [datasets](https://huggingface.co/datasets/). We provide a preprocess script for downloading and preprocessing these datasets:
```
cd transformers
python preprocess/download_datasets.py
```
Training command:
```
cd transformers
python $interface.py  --mode train --model_name $base_model   --baseline False --batch_size 32 --gpu 0,1,2,3
```
Generating the results for test set:
```
python $interface.py --mode test  /path/to/checkpoints --batch_size 32 --gpu 0
```
Evaluating  system output:
```
python evaluation/$dataset/eval.py --sys_path $sys_file --ref_path $ref_file 
```
