selection:
  prefix: |
    You will be given a question about a video, following frames from the video.
  main: |
    Question: {question}

    Return the frame ids which can answer the given question.
  format: |
    Please use the following JSON format for your output:
    {
        "frame_ids": [List of integer/frame IDs],
        "justification": "<justification about your output>"
    }
answer:
  open-ended:
    prefix: |
      Frames:
    parts: |
      Parts:
    instruction: |
      Instruction: {dot}
      An instruction is represented as a directed, acycle partial graph, where a node is a step and a relation is a order of steps.
      For instance, if there is a directed edge between node A and node B (A -> B), A needs to be done before B is performed.
    main: |
      You will be given a question about a video. You are provided frames from the video, retrieved by an intelligent agent. You are also provided instructions and parts image.
      It is crucial that you imagine the visual scene as vividly as possible to enhance the accuracy of your response. Answer in the following format: <answer>your answer</answer>
      Question: {question}
