





## Dependencies

- Python 3.7, PyTorch 1.4 or newer (also tested in PyTorch 1.7)
(We recommend creating a container using the [pytorch/pytorch:1.4-cuda10.1-cudnn7-devel](https://hub.docker.com/layers/pytorch/pytorch/1.4-cuda10.1-cudnn7-devel/images/sha256-c612782acc39256aac0637d58d297644066c62f6f84f0b88cfdc335bb25d0d22?context=explore) official pytorch docker image.)
- Run `scripts/setup.sh`:
  ```
  cd scripts
  bash setup.sh
  ```
  This will install the following:

  - Transformers v2.8.0 and its example-specific requirements mentioned [here](https://github.com/huggingface/transformers/tree/master/examples#important-note)
  - `wget` and `unzip` (to download and unzip data and model checkpoints)

## Training the model

  - We include code to train a topic GeDi
  - This runs in about 5 hours on a 16GB V100 GPU on GCP.
  - First, download and process the topic data:

  ```
  cd scripts
  bash get_data.sh
  ```

  - Then run training using:

  `bash run_training.sh` which calls `../train_GeDi.py` with the appropriate arguments

  - The directory for model to be saved is specified by `output_dir` argument.
  - When generating from your trained GeDi, you will need to call `../generate_GeDi.py` (called from `bash run_generation.sh`) with `--gedi_model_name_or_path` set to the directory of your trained model.
### adding new datasets
- new datasets can be added by creating dataset folder with 2 tsv files, train.tsv and dev.tsv
- each row of the tsv file should include 2 columns, where the first column is the text, and the second column is a binary label (1 or 0)
- can use [Jigsaw](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification) dataset for toxicity GeDi, and [IMDb](https://huggingface.co/datasets/imdb) dataset for sentiment GeDi
- specify the new data directory `--data_dir` in `train_GeDi.py` to point to the dataset directory with the train and dev files
- for sentiment training, set `--code_1 positive` and `--code_0 negative`
- for detoxification training, set `--code_1 dirty` and `--code_0 clean`

## Generating from models


    - To generate, use `bash run_generation.sh`, which calls `../generate_GeDi.py` with the appropriate arguments (set for topic generation by default).

    Important arguments include:

    * `--mode` can be set to `topic`, `sentiment`, or `detoxify`
    * `--gen_type` can be set to `gedi` for GeDi guided generation, `cclm` for class conditional generation, or `gpt2` to generate from raw GPT-2
    * `--gen_length` max length of generation
    * `--gedi_model_name_or_path` path to GeDi model. If unused, will assume you ran `bash get_models.sh` and infer model directory from `--mode` argument
    * `--filter_p` equal to \rho in Equation 7 of the paper
    * `--disc_weight` exponent for posterior weighting (\omega in Equation 4 of the paper)

  Running will allow you to enter control codes and prompts for generation in a continuous loop until you exit.

### Topic generation (Section 6.3 & 6.4 of the paper)
- Set `--mode topic` in `scripts/run_generation.sh`
- You will be prompted to give a topic code. The model was trained on `world`, `sports`, `business`, and `science`, but can often generate other topics zero-shot, for instance `space`, `fire`, `climate`, `education`
- If the topic code you give is more than one [BPE token](https://arxiv.org/abs/1508.07909), the model often struggles because the 4 training topics were all 1 BPE token. You will be warned that this might not work, but can proceed by hitting enter again (or can type a new topic code).
- After the topic code, you will be asked to give a prompt to the model to condition on for generation.

### Sentiment control (Section 6.1 of the paper)
- Set `--mode sentiment` in `scripts/run_generation.sh`
- The model can controllably generate positive or negative text. When generalizing to other domains such as stories, this often translates to positive/negative mood or tone of the story (since sentiment implies an opinion).
- The model is set to positive sentiment by default. You will be prompted for the opportunity to change to negative sentiment by typing `n`. Note that the negative model can be very negative, and this sometimes results in toxic or offensive samples.
- You will then be asked to give a prompt to the model to condition on for generation.

### Detoxication (Section 6.2 of the paper)
- Set `--mode detoxify` in `scripts/run_generation.sh`
- This mode can be used to avoid generating toxic or offensive text.
- You will then be asked to give a prompt to the model to condition on for generation.
- GeDi can often find a way to navigate especially aggressive prompts, but does rarely but occasionally still generate toxic text if given certain prompts. We observed this can be a problem for longer generations.

### Class-conditional LM and GPT-2 generation
- Two of the baselines we consider are generating from GPT-2 (will give same result regardless of control codes), and generating from the GeDi model directly as a class-conditional language model (instead of using it to guide generation from GPT-2).
- Set `--gen_type gpt2` to generate from GPT-2, and `--gen_type cclm` to generate directly from the GeDi as a class-conditional language model.

### GPT-3 generation (API access needed)
- If you have your own GPT-3 API secret key, you can use GeDi to guide decoding from GPT-3.
- This is somewhat limited, since the GPT-3 API only allow access to the top 100 next token log probabilities.
- Reuses settings for controlling GPT-2 (which uses all next token log probs), retuning for GPT-3 could give better results.
- It is also slow (up to 1 second per token) because modifying GPT-3 decoding requires calling the API one token at a time.

To control sentiment from GPT-3 using your API key (should have prefix "sk"):

`pip install openai`

`python ../generate_GeDi.py --penalize_cond --gen_length 100 --mode sentiment --gpt3_api_key sk-xxxxxxxx`

You can also try changing the `--mode` or other arguments. To generate directly from GPT-3 without GeDi using our same greedy decoding scheme:

`python ../generate_GeDi.py --penalize_cond --gen_length 100 --mode sentiment --gen_type gpt2 --gpt3_api_key sk-xxxxxxx`
