# RLHF Data

This folder contains the data we used for DPO:
- `sami-data.jsonl`: Preference pairs generated using the template from [SAMI](https://github.com/janphilippfranken/sami)
- `self-control-data.jsonl`: Preference pairs generated using SelfControl

And the responses after training are shown in:
- `dpo-llama-chat-sami-test.json`
- `dpo-llama-chat-self-control-test.json`
