# Generating Responses

To generate responses on 300 prompts from the test set, run:
```
bash generate_outputs_HH.sh
```

# GPT evaluation

To run head-to-head comparison between two models, provide the output files of both models in `gpt4_eval.sh` and run:

```
bash gpt4_eval.sh
```