# OSCS: Online Selection with Provable FAR Control for LLM Safety

---

## Environment Setup

### 1. Create the Conda Environment

The code has been tested with **Python 3.10.19**. Please create and activate a Conda environment with the appropriate Python version:

```bash
conda create -n OSCS python=3.10.19
conda activate OSCS
```

### 2. Install Dependencies

Install all required dependencies using `pip`:

```bash
pip install -r requirements.txt
```

---

## Running Experiments

We provide a `run.sh` script to reproduce experiments across different datasets, backdoor attack methods, and scoring functions.

### Supported Configurations

* **Dataset**

  * `agnews`

* **Backdoor Attacks (Poisoners)**

  * `badnets`
  * `addsent`
  * `stylebkd`
  * `synbkd`

* **Scoring Functions**

  * `md` (Mahalanobis Distance)
  * `badacts`

* **Defense Method**

  * `OSCS`

* **Pretrained Models**

  * `roberta-base` (default)
  * `bert-base-uncased` (optional)

---

### Run All Experiments

First, ensure the script is executable:

```bash
chmod +x run.sh
```

Then execute:

```bash
bash run.sh
```

The script automatically iterates over all combinations of:

* scoring functions,
* backdoor attack methods,
* and datasets,

and runs `main.py` with the corresponding configurations.

---

## Script Details

The core command executed by `run.sh` is:

```bash
python main.py \
  --dataset_name agnews \
  --poisoner_name <poisoner_name> \
  --method OSCS \
  --score_name <score_name> \
  --model_name roberta-base \
  --T 20000
```

To switch from RoBERTa to BERT, uncomment the following line in `run.sh`:

```bash
# --model_name bert-base-uncased
```

---

## Model Checkpoints and Outputs

Trained models and intermediate results are saved to:

```bash
./models
```

You may change the output directory by modifying `MODEL_SAVE_PATH` in `run.sh`.
