# CertainlyUncertain
## Data Generation
### Image-Based (Extraneous)
- Get salient object prompt: `data_generation/get_noun_subset.py`
- Grounded SAM: `data_generation/Grounded-Segment-Anything/generate_mask_vqa.py`
- Lama Inpainting: `data_generation/Inpaint-Anything/remove_objects_vqa.py`
- GPT4V paired question generation: `data_generation/image_based_data_gen_vqa.py`

### Caption-Based (Knowledge, Complex, Temporal, Ambiguous)
`data_generation/caption_based_data_gen_docci.py`


## Training with our data
### LLaVA
- Instruct-Tune-LoRA: Please check the running command of different variants under `llava/exp_scripts`

### Qwen-VL
- SFT-LoRA and Rtune-LoRA: Please check the running command of different variants under `Qwen-VL/exp_scripts`
- DPO-LoRA: Please check the running command of different variants under `Qwen-VL/DPO/exp_scripts`


## Evaluation
- $\text{LAVE}_{idk}$ and Confidence-weighted Accuracy is implemented in `data_generation/ours_and_lave_metric.py`