# Code Files for Non-Parallel Text Style Transfer with Self-Parallel Supervision

## File Structure

We include three main projects that we developed for this paper:

1. **LaMer**: the main code base for our TST model
2. **allsides_l2r_test.csv**: the dataset testset we proposed for political stance transfer
3. **Benchmark_Eval**: the evaluation scripts for running benchmark results on three TST tasks

and we also include generation samples of LaMer:

- `generation_samples.docx`: a briefing file that includes a subset of the generation samples
- outputs: the folder contains the generation outputs of LaMer in three TST tasks with different parameter settings ($p$, $k$, )


## LaMer

### How to run the Alignment?

The configure file to set $p$ and $k$ (mentioned in paper Section 2.2) can be found in the file `parallel_configs.py`

The script for creating Sentiment and Formality (roughly) parallel dataset is `run_parallel_yelp_formal.py`. In the beginning of the main function you can switch to different config settings mentioned in `parallel_configs.py` for different TST tasks. By default it will align positive to negative sentiment TST task dataset, with three alignment strategies: random, S-Emb., and S-Emb. + SAS. Similar to the political stance transfer task with a standalone file named `run_parallel_allsides.py`. We also prepare the already aligned datasets for conenience. They are located at `LaMer/aligned_data/`.

### How to run the MLE Training?

Before you run the MLE training, please make sure you prepare the aligned (roughly) parallel data in the folder, such as `./data/yelp/t2t/pos2neg/...` for sentiment positive transfer (We already place files there, but you should generate yours for other tasks in other settings).

The file `run_MLE.py` is the main script for MLE training. You can run the following script with existing files in the data folder:

```
python run_MLE.py --model_name_or_path facebook/bart-base --fp16 --do_train --do_predict --train_file "./data/yelp/t2t/pos2neg/train_yelp_lm_kg_tok300_top06.csv" --validation_file "./data/yelp/t2t/pos2neg/eval_yelp_lm_kg_tok300_top06.csv" --test_file "./data/yelp/t2t/pos2neg/test_pos.csv" --text_column "sent1" --summary_column "sent2" --output_dir ./yelp_pos2neg_lm_kg_tok300_top06 --overwrite_output_dir --per_device_train_batch_size=16 --per_device_eval_batch_size=16 --predict_with_generate True --num_train_epochs 2
```

We take sentiment TST positive to negative task as an example, with $k=300$ and $p=0.6$. The output should be a folder specified by `--output_dir` which is `yelp_pos2neg_lm_kg_tok300_top06`, with the trained model file, and generated outputs on the test set.


### How to run the Imitation Learning?

The first step will generate both text2text (t2t) files for MLE and grouped-by-source files for imitation learning (il) in the `LaMer/data/` folder. This step we require you have files ready in `./data/yelp/il/pos2neg/...`.

Then you can run IL by:

```
`python run_IL.py`
```

The script will read the configuration files in `configs/il_config.py`. By default it will load our settings for the sentiment TST task. 

### How to get Data Statistics?

You can run the script called `run_data_analysis.py` to reproduce the results we show in Table 1 the main paper.


## Political Stance Datasets

In the `Political_Stance_Dataset` folder we reveal the dataset for the new challenging TST task: Political Stance Transfer. As mentioned in the main paper Section 3.1, it contains 2,298 pairs of full-length news articles from 6/1/2012 to 4/1/2021. Each pair (same number in the file name in the `left_out` and `right_out` folder) are ideological parallel news aligned by Allsides editors. We collected all these news based on the page (https://www.allsides.com/story/admin) and use a main portion (2,298) for constructing the training dataset and take the rest (524) titles pairs as the test set with human written reference. You can further run the alignment step we mentioned before to construct the sentence-level roughly parallel dataset (We already include some parallel demos in `LaMer/data/allsides/`).


## Benchmark Results Reproduction

We also prepare the script to train judgement classifier, and judge the generation outputs by ACC, BLEU, PPL, and many other possible metrics (e.g., BERTScore, ROUGE, etc.). In the folder `Benchmark_Eval` we have `run_classification.py` and `run_eval.py` for such purpose. Due the file size limit we cannot upload the trained judgement classifier and KenLM language models. We will make them publicly available after double blind review. 

