# Scenario Backdoor Attack for LLM Planning

## Data Preparation

There are two types of data.
- Backdoor data: Generated by `Scenic`.
- Benign data: Generated using `PythonAPI/examples/generate_traffic.py` in `CARLA`.

For Scenic data, the file structure is as follows:
```bash
.
├── images
│   ├── scenario1
│   ├── scenario2
│   ├── scenario3
│   └── ...
│
├── records
│   ├── scenario1.log
│   ├── scenario2.log
│   ├── scenario3.log
│   └── ...
```

For Benign data, the file structure is as follows:
```bash
.
├── images
│   ├── 201903.png
│   ├── 201904.png
│   ├── 201905.png
│   └── ...
└── recording01.log
```

## Get Started

### Run VLM to convert image and log into description

```bash
python vlm.py
```

### Construct the dataset

Create `filenames` list by running
```bash
python scenario_backdoor/create_filelist.py
```

Create benign and backdoor data by running
```bash
python scenario_backdoor/create_backdoor.py
```

Then construct the data format.
```bash
python scenario_backdoor/create_dataset.py
```

### Generate evaluation data

```bash
python nusc/generate.py # generate benign evaluation data using GPT-4
python nusc/backdoor.py # generate backdoor evaluation data using GPT-4
```


## Version

| **Version** | **Training set** | **Description** |
| --- | --- | --- |
| 0408-rc1 | Town1, Town2, Town3 | Initial Version |
| 0408-rc2 | Town1, Town2, Town3 | Fix json format bug |
| 0408-rc2-fix | Town1, Town2, Town3 | Fix `is_obstacle` mismatch with description |
| 0422-rc2 | Benign generated by `generate_traffic.py` in `CARLA` and `Scenic` failure case | We can generate benign case using our environment description (same as generate `eval_data_v1.0`) |
| 0423-rc1 | Benign generated by `generate_traffic.py` in `CARLA` and `Scenic` failure case | Add definition of obstable (e.g., not include car, people) in the backdoor reasoning |
| 0423-rc2 | Benign generated by `generate_traffic.py` in `CARLA` and `Scenic` failure case | Add definition of obstable (such as trash bin, mailbox) |
