# README

* cnn: the source code of SALO
* datasets: open datasets from huggingface including Advbench, PKU-SafeRLHF, ...
* process_data.py: process prompts to hidden states.
* salo_multigpu.py: an example to train SALO and benchmark it.
* benign_adversarial_dataset.jsonl: dataset for causal tracing experiment.