generate_qa_prompt_order = """You will be presented with a list of captions describing keyframes of a video. Your task is to generate five multi-choice questions (and corresponding answers) based on the captions. The questions should focus on the sequential order of the keyframe contents. Make sure that the questions are related to the order of the keyframes and diverse.

Here is an example of captions and the corresponding questions and answers:
Captions:
1. The image shows a person playing basketball.
2. The image shows a dog running on the grass.
3. The image is about a beautiful flower on the table.
4. The image illustrates a bustling city street.

Questions and Answers:
{{
    "qas": [
        {{
        "question": "Sort the events from the video by their chronological order. (1) city street; (2) dog running on the grass; (3) person playing basketball; (4) flower on the table.",
        "options": [
            "(1)(2)(3)(4)",
            "(3)(2)(4)(1)",
            "(2)(1)(4)(3)",
            "(1)(4)(3)(2)"
        ],
        "answer": "(3)(2)(4)(1)" 
        }},
        {{
        "question": "Organize the listed events from the video according to their time sequence: (1) city street (2) dog running on the grass (3) person playing basketball (4) flower on the table",
        "options": [
            "city street -> dog running on the grass -> person playing basketball -> flower on the table",
            "person playing basketball -> dog running on the grass -> flower on the table -> city street",
            "dog running on the grass -> city street -> flower on the table -> person playing basketball",
            "city street -> flower on the table -> person playing basketball -> dog running on the grass"
        ],
        "answer": "person playing basketball -> dog running on the grass -> flower on the table -> city street" 
        }},
        {{
        "question": "What is the correct order that objects appear in the video?",
        "options": [
            "dog, flower, person, street",
            "person, dog, flower, street",
            "street, flower, person, dog",
            "flower, street, dog, person"
        ],
        "answer": "person, dog, flower, street" 
        }},
        {{
        "question": "In what sequence do the events occur in the video?",
        "options": [
            "a dog running and then a person playing basketball",
            "a city street is shown and then a flower is shown",
            "a flower appears followed by a city street",
            "a city street appears followed by a dog running"
        ],
        "answer": "a flower appears followed by a city street" 
        }}
    ]
}}

Now please generate five question-answer pairs based on the following captions. Ensure your resonse follows the above JSON format and style of the example question-answer pairs.
Captions:
"""

generate_caption_prompt_attribute = """You will be presented with an original image caption. Your task is to enrich this caption with details regarding the attributes of objects or environment. The attributes could include but not limited to these aspects: 'light condition', 'color', 'size & shape', 'emotion' and 'posture'. You are required to create two distinct captions for each attribute. These two captions should contrast each other in terms of the corresponding attribute (e.g., black versus white for 'color', small versus big for 'size & shape').

Here are some examples of original caption and enriched captions related to different aspects:

Original Caption: The image shows a person sitting on the chair.
Enriched Captions:
{{
    "captions": {{
        "light condition": [
            "The image shows a person sitting on the chair with a beam of light illuminating his face.",
            "The image shows a person sitting on the chair. His appearance is barely recognizable in the dim environment."
        ],
        "emotion": [
            "The image shows a person, with a big smile, sitting on the chair",
            "The image shows an angry person sitting on the chair"
        ],
        "posture": [
            "The image shows a person sitting relaxing on the chair.",
            "The image shows a person standing straight in fromt of the chair."
        ],
        "size & shape": [
            "The image shows a person sitting on a round, cylindrical chair.",
            "The image shows a person sitting on a square-shaped chair."
        ]
    }}
}}

Original Caption: The image shows an apple on the table.
Enriched Captions:
{{
    "captions": {{
        "color": [
            "The image shows a red apple on the table.",
            "The image shows a green apple on the table."
        ],
        "size & shape": [
            "The image shows a big ripe apple on the table.",
            "The image shows a rotten apple on the table."
        ]
    }}
}}

Original Caption: The image illustrates an air balloon.
Enriched Captions:
{{
    "captions": {{
        "light condition": [
            "The image illustrates an air balloon in a dark room.",
            "The image illustrates an air balloon in a bright room.",
        ],
        "size & shape": [
            "The image illustrates a deflated air balloon.",
            "The image illustrates an inflated air balloon."
        ],
        "color": [
            "The image illustrates a light blue air balloon.",
            "The image illustrates an air balloon in yellow color."
        ]
    }}
}}

Now please generate enriched captions based on the following original caption. Ensure your resonse follows the JSON format of the above examples.
Original Caption: """

generate_qa_prompt_attribute = """You will be presented with several pairs of image captions. Each pair of captions depicts two keyframes in a video. Your task is to generate multi-choice questions (and corresponding answers) for each pair of captions. The questions should focus on the change of attribute between the keyframe contents. Ensure that the questions are diverse and distinct from each other in wordings.

Here are some examples of caption pairs and generated question-answer pairs:

Caption Pair 1:
1. The image shows a person sitting on the chair with a beam of light illuminating his face.
2. The image shows a person sitting on the chair. His appearance is barely recognizable in the dim environment.

Caption Pair 2:
1. The image shows a person, with a big smile, sitting on the chair.
2. The image shows an angry person sitting on the chair.

Questions and Answers:
{{
    "qas": {{
        "caption_pair_1": {{
            "question": "How does the light condition change in the video?",
            "options": [
                "remaining stable",
                "turning darker",
                "turning brighter"
            ],
            "answer": "turning darker" 
            }},
        "caption_pair_2": {{
            "question": "What change occurs to the person in the video?",
            "options": [
                "changing from smiling to being angry",
                "changing from being angry to smiling",
                "changing from feeling shy to being angry",
                "changing from feeling awkward to smiling"
            ],
            "answer": "changing from smiling to being angry" 
        }}
    }}
}}

Caption Pair 1:
1. The image illustrates an air balloon in a dark room.
2. The image illustrates an air balloon in a bright room.

Caption Pair 2:
1. The image illustrates a deflated air balloon.
2. The image illustrates an inflated air balloon.

Caption Pair 3:
1. The image illustrates a light blue air balloon.
2. The image illustrates an air balloon in yellow color.

Questions and Answers:
{{
    "qas": {{
        "caption_pair_1": {{
            "question": "What transformation is occurring in the brightness of the video?",
            "options": [
                "increasing",
                "staying the same",
                "decreasing"
            ],
            "answer": "increasing" 
            }},
        "caption_pair_2": {{
            "question": "What is happening to the shape of the air balloon?",
            "options": [
                "it is getting bigger",
                "it is getting smaller",
                "its size and shape remains consistent"
            ],
            "answer": "it is getting bigger" 
        }},
        "caption_pair_3": {{
            "question": "How can we describe the change happening to the air balloon?",
            "options": [
                "its color changes from grey to yellow",
                "its color changes from light blue to yellow",
                "its color changes from yellow to light blue",
                "its color changes from yellow to green"
            ],
            "answer": "its color changes from light blue to yellow" 
        }}
    }}
}}

Now please generate question-answer pairs based on the following caption pairs. Ensure your resonse follows the above JSON format and style of the example question-answer pairs.
"""

generate_caption_prompt_counterfactual = """You will be presented with an original image caption. Your task is to modify this caption by changing the original elements including object, action and attribute. Ensure that the modified element is distint from the original ones.

Here is an example of original caption and modified captions:

Original Caption: The image shows a person sitting on the chair.
Modified Captions:
{{
    "captions": {{
        "change_object": [
            "The image shows a cat sitting on the chair.",
            "The image shows a dog sitting on the chair.",
            "The image shows a cup placed on the chair."
        ],
        "change_action": [
            "The image shows a person standing next to the chair.",
            "The image shows a person sleeping on the chair.",
            "The image shows a person dancing nearby the chair."
        ],
        "change_attribute": [
            "The image shows a tall person sitting on the chair.",
            "The image shows a short person sitting on the chair.",
            "The image shows a strong person sitting on the chair."
        ]
    }}
}}

Now please generate modified captions based on the following original caption. Ensure your resonse follows the JSON format of the above example.
Original Caption: """

generate_qa_prompt_refer = """You will be presented with several groups of image captions. In each group, the captions only distinct from each other in terms of particular details. Your task is to generate multi-choice questions (and corresponding answers) for each group of captions. The questions should focus on the the difference among the captions in this group. Ensure that the questions are diverse and distinct from each other in wordings.

Here are some examples of caption groups and generated question-answer pairs:

###Caption Group 1:
Caption 1. The image shows a cat sitting on the chair.
Caption 2. The image shows a dog sitting on the chair.
Caption 3. The image shows a cup placed on the chair.

###Caption Group 2:
Caption 1. The image shows a person standing next to the chair.
Caption 2. The image shows a person sleeping on the chair.
Caption 3. The image shows a person dancing nearby the chair.

###Caption Group 3:
Caption 1. The image shows a white flower on the table.
Caption 2. The image shows a red flower on the table.
Caption 3. The image shows a green flower on the table.

Questions and Answers:
{{
    "qas": {{
        "caption_group_1": {{
            "question": "What is shown on the chair?",
            "options": [
                "a cat",
                "a dog",
                "a cup",
                "a flower"
            ],
            "answers": {{
                "caption_1": "a cat",
                "caption_2": "a dog",
                "caption_3": "a cup"
            }}
        }},
        "caption_group_2": {{
            "question": "What is the person doing with the chair?",
            "options": [
                "standing next to the chair",
                "sleeping on the chair",
                "dancing nearby the chair",
                "jumping on the chair"
            ],
            "answers": {{
                "caption_1": "standing next to the chair",
                "caption_2": "sleeping on the chair",
                "caption_3": "dancing nearby the chair"
            }}
        }},
        "caption_group_3": {{
            "question": "How can we describe the person in the video?",
            "options": [
                "white",
                "red",
                "green",
                "blue"
            ],
            "answers": {{
                "caption_1": "white",
                "caption_2": "red",
                "caption_3": "green"
            }}
        }}
    }}
}}

Now please generate question-answer pairs based on the following caption groups. Ensure your resonse follows the above JSON format and style of the example question-answer pairs.
"""

generate_qa_prompt_refer_negation = """You will be presented with several groups of image captions. In each group, the captions only distinct from each other in terms of particular details. Your task is to generate negation multi-choice questions (and corresponding answers) for each group of captions. The questions should focus on the the difference among the captions in this group. Ensure that the questions are diverse and distinct from each other in wordings.

Here are some examples of caption groups and generated question-answer pairs:

###Caption Group 1:
Caption 1. The image shows a cat and a dog sitting on the chair.
Caption 2. The image shows a dog sitting on the chair with a cup.
Caption 3. The image shows a cat sitting on the chair with a cup.

###Caption Group 2:
Caption 1. The image shows a person standing still next to the chair and another person sleeping on the chair.
Caption 2. The image shows a person sleeping on the chair and another person dancing nearby.
Caption 3. The image shows a person standing still next to the chair and another person dancing nearby.

###Caption Group 3:
Caption 1. The image shows a happy person and an angry person sitting on the chair.
Caption 2. The image shows a sad person and a happy person sitting on the chair.
Caption 3. The image shows an angry person and a sad person sitting on the chair.

Negation Questions and Answers:
{{
    "qas": {{
        "caption_group_1": {{
            "question": "What object is not on the chair?",
            "options": [
                "a cup",
                "a cat",
                "a dog"
            ],
            "answers": {{
                "caption_1": "a cup",
                "caption_2": "a cat",
                "caption_3": "a dog"
            }}
        }},
        "caption_group_2": {{
            "question": "What action is not performed in the video?",
            "options": [
                "dancing",
                "standing still",
                "sleeping"
            ],
            "answers": {{
                "caption_1": "dancing",
                "caption_2": "standing still",
                "caption_3": "sleeping"
            }}
        }},
        "caption_group_3": {{
            "question": "Which emotion cannot be observed from the video?",
            "options": [
                "sadness",
                "angry",
                "happiness"
            ],
            "answers": {{
                "caption_1": "sadness",
                "caption_2": "angry",
                "caption_3": "happiness"
            }}
        }}
    }}
}}

Now please generate question-answer pairs based on the following caption groups. Ensure your resonse follows the above JSON format and style of the example question-answer pairs.
"""

any2temp_prompt = """You will be given a question paired with several answer. Your task is to reformulate each answer into a simple declarative statement.

###Example1
Question: What is the color of the cat?
Answer 1: white
Answer 2: orange
Answer 3: black
Declarative Statement 1: the cat is white
Declarative Statement 2: the cat is orange
Declarative Statement 3: the cat is black

###Example2
Question: What is the person doing?
Answer 1: running
Answer 2: playing guitar
Answer 3: swimming
Declarative Statement 1: the person is running
Declarative Statement 2: the person is playing guitar
Declarative Statement 3: the person is swimming

Now please reformulate the following question and answers according to the above examples.
"""