# Training and Evaluation Code
This is the source code for ICLR 2024 submission: Unmasking and Improving Data Credibility: A Study with Datasets for Language Model Safety

Please replace `YOUR_PATH` with your path.
Please run `label_cleaning_algorithm` before run the following experiments.


## Usage

To use the code, try to follow the YAML files under the `./configs` to edit the parameters that will be used for training and evaluation. In 
particular, you need to make sure the data sets are corretly put under the path specified in data configs.

You may start to run all the experiments with the entry script 
```bash
bash run_all.sh
```

Or run one single experiment with the parameters
```bash
i=0
train_label_i="clean_label"
test_label_i="clean_label"
cat ./configs/Jigsaw-bert.yml | sed -e "s/JIGSAW_IDX/${i}/g" -e "s/TRAIN_JIGSAW_LABEL/${train_label_i}/g"  > ./configs/Jigsaw-bert_run.yml
cat ./configs/Jigsaw-bert.yml | sed -e "s/JIGSAW_IDX/${i}/g" -e "s/TRAIN_JIGSAW_LABEL/${train_label_i}/g" -e "s/TEST_JIGSAW_LABEL/${test_label_i}/g"  > ./configs/Jigsaw-bert_test.yml
# training
python3 main.py --config configs/Jigsaw-bert_run.yml
# testing
python3 main.py --config configs/Jigsaw-bert_test --test
```