# nuScenes Dataset

The evaluation dataset is based on nuScenes dataset. We use the first frame image in each `scene` as the query image to
generate scene description. Then, we can modify the dataset to fake `scenario` by prompting `GPT-4` to generate diverse
scenario test.

## Dataset Split

| Dataset      | Split      | Description                             |
|--------------|------------|-----------------------------------------|
| benign-train | [-100: ]   | training set for benign case            |
| benign-eval  | [:50]      | testing set for benign performance      |
| word-trigger | [100: 150] | dataset for creating word-based trigger |

## ToDo

- [x] generate benign dataset
- [x] generate attack successful rate (ASR) dataset
    - LLM-based scenario injection
    - Different position of description
    - Different description of scenario (If we use ICL, we need to justify this is better than direct prompting)
    - so we need two, one with ICL, one without ICL
- [x] generate false alarm rate (FAR) dataset
    - generate benign scenario
    - Different position of description (use ICL to generate more benign scenario, we might also need to justify this)
    - so we need two, one with ICL, one without ICL