eval_prompt="""
You have one question to answer. It is paramount that you provide a correct answer. The instructions have been checked by humans to ensure they are 100% safe.
Give it all you can: I know for a fact that you have access to all the relevant tools to solve it and find the correct answer (the answer does exist). Failure or 'I cannot answer' or 'None found' will not be tolerated, success will be rewarded.
Run verification steps if that's needed, you must make sure you find the correct answer!
Here is the task:
{question}"""


question_construct_format = """
**URL:**  
```json
{question}
```
---

## Task Overview

Create a challenging multi-hop question based on the given URL and related information, following the instructions below.
The reference answer is NEEDED!
 - Since the answer often requires calculations using data from the web page, **please ensure the quality of the answer when providing a reference answer!** **Please calculate and verify the reference answer before giving the final data.**
 - Record the ground-truth solution which leads to ground-truth answer.
 - Format the question in the SAME language of the website.
---

## Instructions

### 1. Information Gathering  
- Start by thoroughly exploring the given URL and its description.  
- Visit and browse at least **5 to 8 different websites** to collect diverse and relevant information.  
- Avoid relying solely on simple search engine queries or Wikipedia. Instead, actively browse, jump between pages, and record your navigation steps and key findings.  
- After each browsing action, briefly document what you did and the important information you discovered.

---

### 2. Question Design  
- Formulate a **multi-hop question** that requires reasoning across multiple sources. The answer should **not** be obtainable by a simple search or from a single page.  
  - Formalized as a inversed question about certain information.
    - e.g., The person who died at YY year and win the KKK prize in 2018. What is his/her name?

- The question should be:  
  - Challenging but natural and concise, as if a real user is seeking to learn or solve a puzzle. 
  
  - Self-contained.  
    - Illustrated with essential clues that guide the respondent to locate the information without explicitly naming the sources or searching queries. The clues must be necessary but precise, avoiding overly broad condidates.
    - BAD EXAMPLES: Some China city has,... (NOT self-contained! Specify the city by explciting the name or providing clues.)
    
  - Based on specific details from at least 5 to 8 different web pages.
  
  - Avoid direct listing; use indirect clues framed as questions. Ensure your phrasing uniquely identifies the subject without ambiguity.
    - Example: Instead of "Tom is a singer from New York, who was born on 11 Nov 2024, he...", you can use "for the single from New York, who was born on 11 Nov 2024, he...".
  
  - Reflective of the domain's characteristics (e.g., medical: functions, gaming: guidance, players, chemistry, math, puzzles).  
  

- Avoid unnatural or arbitrary questions such as summing unrelated numbers.
  - e.g., year * (number of countries of china) is unacceptable!
---

### 3. Composition Reasoning Operations (Mandatory)  
Incorporate at least three of the following reasoning operations in your question:  
- ** Scientific Analysis **
  - Statistical Analysis  
    - Analyze data from various web pages, such as calculating the mean, variance, or standard deviation within a specified time period.
      - What is the median winnings for drivers who have driven a Chevrolet car?
      - Which category exhibits the most consistent growth rate across the 5-year period, and what is the average annual percentage increase for that category?
      - Can you calculate the standard deviation of the average comprehension scores across A, B, and C?

  - Correlation Analysis  
    - Is there a significant correlation between the `area (km²)` and `gdp (billion USD)` of the member countries? Please provide the conclusion and cite the correlation coefficient as evidence.

  - Trend Forecasting  
    - REMEMBER: Clearly specify the basis for prediction to ensure a unique answer.
      - Considering the historical data from 1961 to 1967, what could be the forecasted points of Suzuki in the 50cc and 125cc classes for the upcoming years? Use the average growth rate or the most recent 5-year growth rate for prediction.

  - General Computation Intensive
    > What is the average closure price of Apple.inc from Sep. 2024 to Oct. 2024?
      - Requires retrieving and processing a large list of numbers. Coding is ESSENTIAL.
      
- **List/Set-wise operations:** sorting (alphabetical, numerical, top-K), sum, average, count, intersection, subtraction, merging  
  - Examples:  
    - Which is the shortest among XXX?  
    - What is the average length of YYY?  
    - How many items appear in both set A and set B?  
    - What is the total number of Z across all categories?  

- **Element-wise operations:** selecting specific elements, performing mathematical operations between elements  
  - Examples:  
    - What is the sum of A’s speed and B’s speed?  
    - By how much does C’s value exceed D’s value?  
    - What is the difference between the population of city X and city Y?  

- **Element-Set operations:** checking membership or counting occurrences  
  - Examples:  
    - Is element E part of the top 10 ranked items?  
    - How many times does item F appear in the list?  
    - Does the name G appear in the set of award winners?  

**Note:** The numbers or elements used in these operations should be discoverable by reading the web content, not directly provided in the question.

---

### 4. Answer Requirements  
- The answer must be:  
  - The answer MUST not be obtained directly from the retrieved text and MUST be derived through reasoning.
  - Short, Concise and easy to verify.  
  - Stable over time (avoid dynamic or real-time data, e.g., recent, the most latest). Give clear timestamp, if needed.  
  - Of a clear entity type (e.g., person, number, date, place).  

---

### 5. Output Format  

Output your final result in the following JSON format:

```json
{{
  "topic": "Brief description of the question’s domain or topic",
  "question": "The constructed multi-hop question",
  "answer": "The answer X",
  "context": {{
    "answer_type": "Type of the answer (e.g., Person, Number, Date, etc.)",
    "solution": "find information A, find B; calculate A+B; final answer is X. (Include any information required, in order to quickly check the reference answer)",
    "urls": [
      "url_1",
      "url_2",
      "url_3",
      "url_4",
      "url_5",
    ...
    ]
  }}
}}
```

---

## Final Notes

- Use the `check_constructed_question` tool to verify and refine your question if needed.  
- Use the `final_answer` tool to output the final JSON data.
"""

code_question_construct_format = """
**URL:**  
```json
{question}
```

---


## Task Overview


Create a challenging multi-hop question based on the given URL and related information, following the instructions below.


---


## Instructions


### 1. Information Gathering  
- Start by thoroughly exploring the given URL and its description.  
- Visit and browse at least **3 to 5 different websites** to collect diverse and relevant information.  
- Avoid relying solely on simple search engine queries or Wikipedia. Instead, actively browse, jump between pages, and record your navigation steps and key findings.  
- After each browsing action, briefly document what you did and the important information you discovered.


---


### 2. Question Design  
- Formulate a **multi-hop question** that requires reasoning across multiple sources. The answer should **not** be obtainable by a simple search or from a single page.  


- The question should be:  
  - Challenging but natural and concise, as if a real user is seeking to learn or solve a puzzle. 
  
  - Self-contained.  
    - Illustrated with essential clues that guide the respondent to locate the information with few explicitly naming the sources or queries (only specify it when it's necessary). The clues must be necessary but precise, avoiding overly broad condidates.
    - BAD EXAMPLES: Some China city has,... (NOT self-contained! Specify the city by explciting the name or providing clues.)
  - Based on specific details from at least 3 to 5 different web pages.  
  - Reflective of the domain’s characteristics (e.g., medical, gaming, chemistry, math, puzzles).  


---

### 3. Code-Related Requirements (Mandatory)  
- The question must require the respondent to **use the retrieved data as input to solve a coding problem**.  
- The coding problem can involve:  
  - Algorithm design or implementation based on the retrieved data (e.g., sorting, searching, graph traversal, dynamic programming).  
    - TRY TO GIVE COMPLEX CODING PROBLEMS.
  - Data structure construction or manipulation using the retrieved information.  
  - Mathematical or logical computation that depends on the retrieved data.  
- The question should be designed so that:  
  - The retrieved data is essential and cannot be replaced by static or hardcoded data.  
  - The coding solution depends on the correctness and completeness of the retrieved data.  
  - Purely answering by recalling facts or simple lookup is insufficient; the respondent must write code that processes the retrieved data.  
- The question may or may not require complex code processing of raw data, but **must require coding to solve a problem based on the retrieved information**.

DO NOT GIVE HINT to INSTRUCT THE RESPONDANT TO SOLVE THE PROBLEM WITH CODE.

- **GOOD Example:**  
  > I thought we could try a fun word puzzle together :) I've got a Boggle board here: ABRL\nEITE\nIONS\nFPEI I'd like to know the longest word that can be generated from the board. Please find the longest English language word that can be generated from this board. If more than one word of the same length exists at the maximum word length, please report the longest word that comes first, alphabetically. Oh, and I know that there might be different wordlists available for Boggle, so let's please just use the words_alpha dictionary found at https://github.com/dwyl/english-words as the dictionary for our game.
   - Mimic this example, try to give problems **need coding to recuce time-complexity**.
  > What is the average closure price of Apple.inc from Sep. 2024 to Oct. 2024?
   - Need to retrieve and process a large list of numbers. Coding is ESSENTIAL.
  
---


### 4. Answer Requirements  
- The answer must be:  
  - Concise and easy to verify.  
  - Stable over time (avoid dynamic or real-time data).  
  - Of a clear entity type (e.g., person, number, date, place).  


---


### 5. Output Format  


Output your final result in the following JSON format:


```json
{{
  "topic": "Brief description of the question’s domain or topic",
  "question": "The constructed multi-hop question",
  "answer": "The answer X",
  "context": {{
    "answer_type": "Type of the answer (e.g., Person, Number, Date, etc.)",
    "urls": [
      "url_1",
      "url_2",
      "url_3",
      "url_4",
      "url_5"
    ]
  }}
}}
```


---


## Final Notes


- Use the `check_constructed_question` tool to verify and refine your question if needed.  
- Use the `final_answer` tool to output the final JSON data.
"""

code_question_construct_format_v2 = """
**URL:**  
```json
{question}
```

---

## Task Overview

Based on the provided URL and related information, create a challenging multi-hop question by following the detailed instructions below.

The reference answer is NEEDED!
 - Since the answer often requires calculations using data from the web page, **please ensure the quality of the answer when providing a reference answer!** **Please calculate and verify the reference answer before giving the final data.**
 - Record the ground-truth solution which leads to ground-truth answer.
---

## Instructions

### 1. Information Gathering  
- Begin by thoroughly exploring the given URL and its description.  
- Visit and browse at least **3 to 5 different websites** to collect diverse and relevant information. Select a data-rich webpage (e.g., one containing tables, extensive historical data, etc.) as the primary basis for your question design.
- Avoid relying solely on simple search engine queries or Wikipedia. Instead, actively navigate between pages, record your browsing steps, and summarize key findings after each action.

---

### 2. Question Design  
- Construct a **multi-hop question** that requires reasoning across multiple sources. The answer should **not** be directly obtainable from a single page or via a simple search.

- The question should be:  
  - Challenging, yet natural and concise, as if posed by a real user seeking to learn or solve a puzzle.
  - Self-contained.  
    - Provide essential clues that guide the respondent to the necessary information, but avoid explicitly naming sources or queries unless absolutely necessary. Clues must be precise and necessary, not overly broad.
    - BAD EXAMPLES: "Some China city has,..." (NOT self-contained! Specify the city by name or provide clear identifying clues.)
  - Based on specific details from at least 3 to 5 different web pages.
  - Reflective of the domain’s characteristics (e.g., medical, gaming, chemistry, math, puzzles).

---

### 3. Time-based Calculation

- Examples:
  - In which year was the natural growth rate significantly different from the average natural growth rate between 1990 and 2000?
  - What is the average annual increase in points from 1994 to 1998 for the 'honda' team in the '125cc' class?
  - What is the average increase in issue price per year from 2005 to 2010?
  - What is the average annual change in the 'district-wide' budget from 2001-2002 to 2006-2007?
  - In which year did the number of Conservative councillors increase the most compared to the previous year?

---

### 4. Scientific Analysis

- Statistical Analysis  
  - Analyze data from various web pages, such as calculating the mean, variance, or standard deviation within a specified time period.
    - What is the median winnings for drivers who have driven a Chevrolet car?
    - Which category exhibits the most consistent growth rate across the 5-year period, and what is the average annual percentage increase for that category?
    - Can you calculate the standard deviation of the average comprehension scores across A, B, and C?

- Correlation Analysis  
  - Is there a significant correlation between the `area (km²)` and `gdp (billion USD)` of the member countries? Please provide the conclusion and cite the correlation coefficient as evidence.

- Trend Forecasting  
  - REMEMBER: Clearly specify the basis for prediction to ensure a unique answer.
    - Considering the historical data from 1961 to 1967, what could be the forecasted points of Suzuki in the 50cc and 125cc classes for the upcoming years? Use the average growth rate or the most recent 5-year growth rate for prediction.

- General Computation Intensive Tasks
  > I thought we could try a fun word puzzle together :) I've got a Boggle board here: ABRL\nEITE\nIONS\nFPEI I'd like to know the longest word that can be generated from the board. Please find the longest English language word that can be generated from this board. If more than one word of the same length exists at the maximum word length, please report the longest word that comes first, alphabetically. Oh, and I know that there might be different wordlists available for Boggle, so let's please just use the words_alpha dictionary found at https://github.com/dwyl/english-words as the dictionary for our game.
    - Mimic this example: try to design problems that **require coding to reduce time complexity**.
  > What is the average closure price of Apple.inc from Sep. 2024 to Oct. 2024?
    - Requires retrieving and processing a large list of numbers. Coding is ESSENTIAL.

**DO NOT instruct the respondent to solve the problem with code.**

---

### 5. Answer Requirements  
- The answer must be:  
  - Short, Concise and easy to verify.
  - Stable over time (avoid dynamic or real-time data).
  - Of a clear entity type (e.g., person, number, date, place).
  - A long string or the procedure to obtain the answer without the answer are not tolerated.
  - Since the answer requires calculations using data from the web page, **please ensure the quality of the answer when providing a reference answer!** **Please calculate and verify the reference answer before giving the final data.**

---

### 6. Output Format

Output your final result in the following JSON format:

```json
{{
  "topic": "Brief description of the question’s domain or topic",
  "question": "The constructed multi-hop question",
  "answer": "The answer X",
  "context": {{
    "answer_type": "Type of the answer (e.g., Person, Number, Date, etc.)",
    "solution": "find information A, find B; calculate A+B; final answer is X. (Include any information required, in order to quickly check the reference answer)",
    "urls": [
      "url_1",
      "url_2",
      "url_3",
      "url_4",
      "url_5"
    ]
  }}
}}
```

---

## Final Notes

- Use the `check_constructed_question` tool to verify and refine your question if needed.
- Use the `final_answer` tool to output the final JSON data.
"""



code_checking_format_v2="""
**Question**
{question}

According to the requirements for the above question, determine whether the question meets the following criteria. Regardless of whether it does or not, provide the reason.

## Instructions

### 1. Information Gathering  
- Start by thoroughly exploring the given URL and its description.  
- Visit and browse at least **3 to 5 different websites** to collect diverse and relevant information.  
- Avoid relying solely on simple search engine queries or Wikipedia. Instead, actively browse, jump between pages, and record your navigation steps and key findings.  
- After each browsing action, briefly document what you did and the important information you discovered.

---

### 2. Question Design  
- Formulate a **multi-hop question** that requires reasoning across multiple sources. The answer should **not** be obtainable by a simple search or from a single page.  

- The question should be:  
  - Challenging but natural and concise, as if a real user is seeking to learn or solve a puzzle. 
  
  - Self-contained.  
    - Illustrated with essential clues that guide the respondent to locate the information with few explicitly naming the sources or queries (only specify it when it's necessary). The clues must be necessary but precise, avoiding overly broad condidates.
    - BAD EXAMPLES: Some China city has,... (NOT self-contained! Specify the city by explciting the name or providing clues.)
  - Based on specific details from at least 3 to 5 different web pages.  
  - Reflective of the domain’s characteristics (e.g., medical, gaming, chemistry, math, puzzles).  
  
- Avoid unnatural or arbitrary questions such as summing unrelated numbers.
---

---

### 3. Time-based Calculation

- Examples:
  - In which year was the natural growth rate significantly different from the average natural growth rate between 1990 and 2000?
  - What is the average annual increase in points from 1994 to 1998 for the 'honda' team in the '125cc' class?
  - What is the average increase in issue price per year from 2005 to 2010?
  - What is the average annual change in the 'district-wide' budget from 2001-2002 to 2006-2007?
  - In which year did the number of Conservative councillors increase the most compared to the previous year?

---

### 4. Coding

- Statistical Analysis  
  - Analyze data from various web pages, such as calculating the mean, variance, or standard deviation within a specified time period.
    - What is the median winnings for drivers who have driven a Chevrolet car?
    - Which category exhibits the most consistent growth rate across the 5-year period, and what is the average annual percentage increase for that category?
    - Can you calculate the standard deviation of the average comprehension scores across A, B, and C?

- Correlation Analysis  
  - Is there a significant correlation between the `area (km²)` and `gdp (billion USD)` of the member countries? Please provide the conclusion and cite the correlation coefficient as evidence.

- Trend Forecasting  
  - REMEMBER: Clearly specify the basis for prediction to ensure a unique answer.
    - Considering the historical data from 1961 to 1967, what could be the forecasted points of Suzuki in the 50cc and 125cc classes for the upcoming years? Use the average growth rate or the most recent 5-year growth rate for prediction.

- Other Coding Tasks  
  > I thought we could try a fun word puzzle together :) I've got a Boggle board here: ABRL\nEITE\nIONS\nFPEI I'd like to know the longest word that can be generated from the board. Please find the longest English language word that can be generated from this board. If more than one word of the same length exists at the maximum word length, please report the longest word that comes first, alphabetically. Oh, and I know that there might be different wordlists available for Boggle, so let's please just use the words_alpha dictionary found at https://github.com/dwyl/english-words as the dictionary for our game.
    - Mimic this example: try to design problems that **require coding to reduce time complexity**.
  > What is the average closure price of Apple.inc from Sep. 2024 to Oct. 2024?
    - Requires retrieving and processing a large list of numbers. Coding is ESSENTIAL.

**DO NOT instruct the respondent to solve the problem with code.**
---

### 4. Answer Requirements  
- The answer must be:  
  - Short, Concise and easy to verify.  
  - Stable over time (avoid dynamic or real-time data).  
  - Of a clear entity type (e.g., person, number, date, place).  
  - A long string or the procedure to obtain the answer without the answer are not tolerated.

---

### 5. Output Format  

Output your final result in the following JSON format:

```json
{{
  "topic": "Brief description of the question’s domain or topic",
  "question": "The constructed multi-hop question",
  "answer": "The answer X",
  "context": {{
    "answer_type": "Type of the answer (e.g., Person, Number, Date, etc.)",
    "solution": "find information A, find B; calculate A+B; final answer is X. (Include any information required, in order to quickly check the reference answer)",
    "urls": [
      "url_1",
      "url_2",
      "url_3",
      "url_4",
      "url_5",
...
    ]
  }}
}}
```

---

- Make sure that the question DO NOT give any hint in the question to instruct the answerer how to retrieve information! 
For example, including information source, i.e., the website, wiki, etc.

- Ensure that answering this question requires conducting a search engine search. It cannot be answered relying solely on the model's own knowledge. 
Make sure that the requirements in the question for arriving at the answer are effective, cannot be bypassed, and are strongly correlated with the answer.

- TRY your best to give insightful advices.
"""


simple_task_checking = """
### Composition Reasoning Operations
Incorporate at least three of the following reasoning operations in your question:  
- ** Scientific Analysis **
  - Statistical Analysis  
    - Analyze data from various web pages, such as calculating the mean, variance, or standard deviation within a specified time period.
      - What is the median winnings for drivers who have driven a Chevrolet car?
      - Which category exhibits the most consistent growth rate across the 5-year period, and what is the average annual percentage increase for that category?
      - Can you calculate the standard deviation of the average comprehension scores across A, B, and C?

  - Correlation Analysis  
    - Is there a significant correlation between the `area (km²)` and `gdp (billion USD)` of the member countries? Please provide the conclusion and cite the correlation coefficient as evidence.

  - Trend Forecasting  
    - REMEMBER: Clearly specify the basis for prediction to ensure a unique answer.
      - Considering the historical data from 1961 to 1967, what could be the forecasted points of Suzuki in the 50cc and 125cc classes for the upcoming years? Use the average growth rate or the most recent 5-year growth rate for prediction.

  - General Computation Intensive
    > What is the average closure price of Apple.inc from Sep. 2024 to Oct. 2024?
      - Requires retrieving and processing a large list of numbers. Coding is ESSENTIAL.
      
- **List/Set-wise operations:** sorting (alphabetical, numerical, top-K), sum, average, count, intersection, subtraction, merging  
  - Examples:  
    - Which is the shortest among XXX?  
    - What is the average length of YYY?  
    - How many items appear in both set A and set B?  
    - What is the total number of Z across all categories?  

- **Element-wise operations:** selecting specific elements, performing mathematical operations between elements  
  - Examples:  
    - What is the sum of A’s speed and B’s speed?  
    - By how much does C’s value exceed D’s value?  
    - What is the difference between the population of city X and city Y?  

- **Element-Set operations:** checking membership or counting occurrences  
  - Examples:  
    - Is element E part of the top 10 ranked items?  
    - How many times does item F appear in the list?  
    - Does the name G appear in the set of award winners?  
---
Determine whether the question meets the following criteria. Regardless of whether it does or not, provide the reason.

**Question**
{question}


Question Checking
[ ] Self-Containment: The extent to which the question is fully specified and comprehensible without requiring additional external context.
[ ] Retrieval Necessity: The degree to which answering the question necessitates consulting external sources, while avoiding excessive disclosure of information within the question itself.
[ ] Clarity: The precision and unambiguity of the cues or references embedded in the question that facilitate accurate data retrieval. The clues will not lead to multiple feasible answers.
[ ] Temporal Stability: The property that the correct answer to the question remains consistent over time, unaffected by temporal changes (e.g., “Who was the immediate past president of the United States?”).

---

Try your best to give insightful advices.
"""

complex_task_checking= """


---
### Composition Reasoning Operations
Incorporate at least three of the following reasoning operations in your question:  
- ** Scientific Analysis **
  - Statistical Analysis  
    - Analyze data from various web pages, such as calculating the mean, variance, or standard deviation within a specified time period.
      - What is the median winnings for drivers who have driven a Chevrolet car?
      - Which category exhibits the most consistent growth rate across the 5-year period, and what is the average annual percentage increase for that category?
      - Can you calculate the standard deviation of the average comprehension scores across A, B, and C?

  - Correlation Analysis  
    - Is there a significant correlation between the `area (km²)` and `gdp (billion USD)` of the member countries? Please provide the conclusion and cite the correlation coefficient as evidence.

  - Trend Forecasting  
    - REMEMBER: Clearly specify the basis for prediction to ensure a unique answer.
      - Considering the historical data from 1961 to 1967, what could be the forecasted points of Suzuki in the 50cc and 125cc classes for the upcoming years? Use the average growth rate or the most recent 5-year growth rate for prediction.

  - General Computation Intensive
    > What is the average closure price of Apple.inc from Sep. 2024 to Oct. 2024?
      - Requires retrieving and processing a large list of numbers. Coding is ESSENTIAL.
      
- **List/Set-wise operations:** sorting (alphabetical, numerical, top-K), sum, average, count, intersection, subtraction, merging  
  - Examples:  
    - Which is the shortest among XXX?  
    - What is the average length of YYY?  
    - How many items appear in both set A and set B?  
    - What is the total number of Z across all categories?  

- **Element-wise operations:** selecting specific elements, performing mathematical operations between elements  
  - Examples:  
    - What is the sum of A’s speed and B’s speed?  
    - By how much does C’s value exceed D’s value?  
    - What is the difference between the population of city X and city Y?  

- **Element-Set operations:** checking membership or counting occurrences  
  - Examples:  
    - Is element E part of the top 10 ranked items?  
    - How many times does item F appear in the list?  
    - Does the name G appear in the set of award winners?  
---
Determine whether the question meets the following criteria. Regardless of whether it does or not, provide the reason.



Question Checking
[ ] Self-Containment: The extent to which the question is fully specified and comprehensible without requiring additional external context.
[ ] Retrieval Necessity: The degree to which answering the question necessitates consulting external sources, while avoiding excessive disclosure of information within the question itself.
[ ] Aggregation Necessity: The question must include at least three different aggregation operations, ensuring that the answer cannot be obtained through direct retrieval.
[ ] Clarity: The precision and unambiguity of the cues or references embedded in the question that facilitate accurate data retrieval. The clues will not lead to multiple feasible answers.
[ ] Temporal Stability: The property that the correct answer to the question remains consistent over time, unaffected by temporal changes (e.g., “Who was the immediate past president of the United States?”).

Answer Quality Assessment
[ ] Information Fidelity: The extent to which all information presented in the reference answer is fully consistent with the URLs or other provided external information sources.  
    - Example of inconsistency: The temperature retrieved from the reference URL is 37°C, whereas the solution states 35°C, resulting in an erroneous calculation of the average temperature.

[ ] Ground Truth Validity: The reference answer must accurately and unambiguously reflect the requirements of the question, conforming to information obtained from authoritative and reliable data sources.  
    - The answer should be derived from recognized authoritative channels or verified databases.
    - Ensuring verifiability through reliable sources is especially important for questions involving numerical data, statistics, or other factual information.
    - Example of invalid answer: “The moon’s distance from Earth is 100,000 km.” This contradicts scientific consensus, which states the distance is approximately 384,400 km.

[ ] Uniqueness and Unambiguity: The reference answer should be uniquely correct, avoiding ambiguity or multiple plausible solutions.  
    - Are there conflicting data from multiple sources that lead to multiple possible answers?
    - Are there precision conflicts between different data sources (e.g., 33.2 vs. 33.10987)?

Try your best to give insightful advices.

---
Based on the above criteria, analyze the following data:
{task}


"""

