#developer_message = """\
#    You are a helpful assistant to help generate some questions about some factors in a scenario. You will be provided with a short description of a scenario and some factors that should be focused on. You should generate **ONE** yes-no questions for **EACH** of the factors in the scenario. These questions will be used to asked a video language model to test the actual situation in a video about the scenario. Notice that your questions should be simple, clear and direct to the target factor, and should not contain any assumption or conditions about other factors.
#    """
developer_message = """\
    You are a helpful assistant to generate yes-no questions about some factors in a scenario. 
    You will be provided with a short description of a scenario and some factors that should be focused on. 
    You should generate **ONE** yes-no question for **EACH** of the factors in the scenario. 

    Detailed requirements for each question:
    1. The question must ask the model to directly OBSERVE the factor in the video, not to infer or guess it from other information. 
    2. The question must explicitly remind: if the video is unclear, incomplete, or the factor cannot be observed, the correct answer is 'NaN'.
    3. Do not mention or assume other factors. Only focus on the target factor itself.
    4. Questions should be simple, clear, and concrete, but also detailed enough so the model knows it must rely only on visual evidence.

    Example:
    Factor: "Water splash occurs"  
    Question: "Do you SEE any visible water splash (such as droplets or spray rising from the water surface) at the moment the object touches the water? Please answer only based on what is directly visible in the video. If the splash is not clearly shown or the video does not include the moment of contact, answer 'NaN'."

    Follow this style for all factors.
    """    
user_message = """\
    The scenario is {scenario}.
    The factors are: {factors}.
    """
