caption_prompt = """ You're a video understanding assistant. Generate a detailed caption for the current video clip based on the captions of the previous video clips.
Memory:{memory}"""

plan_prompt = """
You're a video understanding assistant, and the video input is streamed. You cannot see the entire content at once. To determine whether the video has provided enough information to answer the given question, based on the video, we need to conduct a phased analysis of the task.
Please generate **two distinct plans** for understanding the video. For each plan, follow the steps below:

### Plan [Number]:
### Step 1: Review Current Information (Confirm what is known and missing)
Based on the video content received so far, answer the following:
* What has happened in the video so far? Briefly list the key events in chronological order.
* Are there any clearly missing or not-yet-shown pieces of key information are still needed in order to answer the question??
  (For example: the event hasn't fully developed, etc.)

### Step 2: Predict Upcoming Developments (Focus on potential clues)
Reasonably predict what might happen next in the video. Write step-by-step what you think is likely to occur:
* What scenes, actions, or turning points are most likely to appear next?
You are **not** required to give the final judgment or answer—the goal is to **outline task predictions** to help with better decision-making later.

Please begin the task analysis using the following information:
**Question:** {question}
**Video Memory So Far:** {memory}
"""

trigger_prompt = """
You are a video understanding assistant receiving streamed video input, meaning you cannot see the entire content at once.
Your task is to assess whether the current information is sufficient to answer the given question, or if it is necessary to continue watching the video to obtain missing key details.

Please make your judgment based on the following context:
* **Question:** `{question}`
* **Plan_list:** `{plan}`
* **Video Memory So Far:** `{memory}`
You need to think step-by-step and try your best to give the solution. You should end your solution with 'So the final answer is yes or no'.
"""

action_prompt = """
To assist you in effectively observing and understanding subsequent event changes based on the provided information, I have equipped you with the following tools:

Below are tool descriptions, notes on using tools, and the call command format:
1. No Tool
    - Function: Normally accept subsequent video clips without performing any operations.
    - Usage: Just specify the tool_name No Tool
    - Return Values: Just accept subsequent video clips without performing any operations

2. Zoom In Tool
    - Function: Crop and Zoom in subsequent video clip
    - Usage: Just specify bbox and the tool_name Zoom In to focus on important area
    - Return Values: Area of the video needed to zoom in, give me the bbox coordinates

3. Object Traction Tool
    - Fucion: Object tracing in subsequent video clip
    - Usage: Just specify the tool_name Object Tractuib to count the object
    - Return Values: A list of dictionaries '[xmin, ymin, xmax, ymax]'

The call command for the Video Caption is:
{{'tool_name': 'Crop and Zoom In', 'bbox':'[xmin, ymin, xmax, ymax]'}}.
The call command for the Crop and Zoom In Tool is:
{{'tool_name': 'Zoom In'}}.
The call command for the Object Traction Tool is:
{{'tool_name': 'Object Traction', 'object bbox':'{{'object1:':[x1min, y1min, x1max, y1max], 'object2:'[x2min, y2min, x2max, y2max], ...}}'}}.

### Task Instructions
Please think the following information step by step, formulating an **information observation and retrieval plan**. Your plan should consider the **relationship between the required information, the current video memory, your existing plan list, and the functions of the available tools**, with the ultimate goal of effectively **observe and understand subsequent event changes and focus on key areas**.

After your analysis, output the **single best next action** in JSON format: `{{'Action': tool_call_command}}`

### Current Context
* **Question:** `{question}`
* **Plan_list:** `{plan}`
* **Video Memory So Far:** `{memory}`
"""
# ### Step 1: Understand the Task Objective (Clarify what to look for)
# Carefully read the question.
# * What is the question trying to determine or find out?
# * What events, actions, changes, or relationships between characters might be involved?



# """Generate a detailed caption for the current video clip based on the captions of the previous video clips."""

# 哪些事件或画面可能包含完成任务所需的关键信息？请指出它们大概在什么时候、哪里会出现？

# 第四步：生成一个观察计划
# 现在，请根据你想象的事件发展过程，制定一个观察策略。回答以下问题：

# 下一步我最应该关注哪个区域或时间段？

# 如果目前信息不够，我应该继续观察哪些内容、观察多久？

# 你不需要给出最终的判断或答案，你的任务是提前规划好观看和收集信息的策略，帮助后续更好地做出决策。

score_prompt = """
You are a video understanding assistant. I have provided you with a Question and the Video Memory So Far, followed by two distinct video understanding plans (Plan 1 and Plan 2), each containing a "Step 1: Review Current Information" and a "Step 2: Predict Upcoming Developments."
Your task is to evaluate these two plans based on the following criteria:

Accuracy of Current State Analysis (Step 1):
How accurately and comprehensively does "Step 1: Review Current Information" (Key Events So Far and Missing Information) reflect the provided Video Memory So Far?
Are the identified missing pieces of information truly relevant to answering the Question?

Reasonableness of Future Planning (Step 2):
Are the "Step 2: Predict Upcoming Developments" (likely next steps/development sets) logical, plausible, and directly related to the current video memory and the given Question?
Do the predictions offer valuable potential clues that could help in answering the Question later?

Finally, compare the two plans and state which one is more reasonable, providing a brief justification for your choice.
Please provide your evaluation using the following format:

Question: {question}
Video Memory So Far: {memory}
Plan_list: {plan_list}

Evaluation of Plan 1
Current State Analysis (Step 1) Score: [X/5]
Reasoning: [Your reasoning for the score, commenting on accuracy, completeness, and relevance of missing info.]
Future Planning (Step 2) Score: [Y/5]
Reasoning: [Your reasoning for the score, commenting on logic, plausibility, relevance of clues, and detail.]
Total Score for Plan 1: [X+Y/10]

Evaluation of Plan 2
Current State Analysis (Step 1) Score: [A/5]
Reasoning: [Your reasoning for the score, commenting on accuracy, completeness, and relevance of missing info.]
Future Planning (Step 2) Score: [B/5]
Reasoning: [Your reasoning for the score, commenting on logic, plausibility, relevance of clues, and detail.]
Total Score for Plan 2: [A+B/10]

Conclusion: More Reasonable Plan
[Plan 1 or Plan 2] is more reasonable because [brief justification comparing it to the other plan, highlighting its strengths].
"""