Example code and data for our hallucination evaluation with the GPT-4 Judge. The "sample_data.json" contains three example images and their related questions in Tri-HE.

Suppose for each question in "sample_data.json", you have generated an answer with your LVLM. In our case we use the InstructBLIP model and store the answer at the key 'instruct_blip_7b'.

Step 1: extract triplet with GPT-4:
python extract_triplet.py

Step 2: generate GPT-4's hallucination judgements to each triplet:
python gpt4_judge.py

Step 3: calculate hallucination rates:
python calculate.py

It will output six hallucination rates with their names.


