# A ReAct style prompt for the hypothesis agent
from langchain_core.messages import HumanMessage, AIMessage, ToolMessage

'''
hypothesis_agent_system_prompt = """
Your goal is to identify cause-effect relationships from data. 
Suppose each variable in the dataset is a node in a graph. 
Feel free to reason on it and provide a brief explanation. 
At the end of your response, please output a list of tuples representing directed edges between two nodes. To make later parsing easy, include it within <edges><edges> tags.

Use an iterative thought-action-observation process to refine your reasoning and make tool calls when appropriate.
If necessary, the process can be repeated after you have observed the result of the action.

Example 1:

Input variables: CO2 emissions, energy consumption, and global temperature increase
Thought 1: I will first examine the dataset and the description, if provided. I know that CO2 emissions can influence global temperature; I need to confirm there are no hidden confounders.
Action 1: Web Search: How do CO2 emissions relate to global temperature increase? Are there other effects or confounders? Is it just a correlation?
Observation 1: The web search output summary confirms that the scientific community mostly agrees that CO2 emissions are a direct cause of global temperature rise.
Thought 2: Based on the web search, I conclude that CO2 emissions influence global temperature increase. How does energy consumption fit in?
Action 2: Web Search: How is energy consumption related to CO2 emissions and global temperatures? Does increased consumption cause increased emissions, or vice-versa?
Observation 2: The output summary states that higher energy consumption implies higher emissions in most countries, but has not been confirmed to directly cause temperature increase.
Thought 3: The possible relationships between these three phenomena are now clear: consumption causes emissions, which in turn causes temperature increase.
Finish: <edges>(energy_consumption, CO2_emissions), (CO2_emissions, global_temperature_increase)<edges>

Example 2:

Input variables: Exercise frequency, weight loss, calorie intake, and metabolic rate
Thought 1: I will analyze the dataset to see how these variables are defined. I suspect that increased exercise frequency leads to weight loss, but I need to confirm the role of calorie intake.
Action 1: Web Search: How does exercise frequency affect weight loss and calorie intake?
Observation 1: The web search output summary indicates that increased exercise frequency generally leads to weight loss, especially when combined with a controlled calorie intake.
Thought 2: Based on common knowledge, I can infer that a higher metabolic rate is also a factor in weight loss, but I need to confirm its relationship with exercise frequency.
Action 2: Web Search: How does metabolic rate influence weight loss in relation to exercise frequency and calorie intake?
Observation 2: The output summary confirms that a higher metabolic rate can enhance the effects of exercise on weight loss, particularly when calorie intake is managed.
Thought 3: Therefore, I conclude that exercise frequency influences weight loss, calorie intake plays a role, and metabolic rate enhances the relationship between exercise and weight loss.
Finish: <edges>(exercise_frequency, weight_loss), (calorie_intake, weight_loss), (metabolic_rate, weight_loss)<edges>

Example 3:

Input variables: Education level, income, job satisfaction, and work-life balance
Thought 1: I will first review the dataset to understand how these variables are defined. It is reasonable that higher education levels can lead to increased income, but i need to check.
Action 1: Web Search: What is the relationship between education level and income?
Observation 1: The web search output summary confirms that higher education levels generally lead to higher income levels.
Thought 2: Based on common knowledge, I can infer that higher income often correlates with greater job satisfaction and improved work-life balance.
Thought 3: Therefore, I conclude that education level influences income, which in turn affects job satisfaction and work-life balance.
Finish: <edges>(education_level, income), (income, job_satisfaction), (income, work_life_balance)<edges>

Begin!

{variables_names}
{dataset_description}
"""
'''

# In case, try to give some examples of tool calls.
hypothesis_agent_system_prompt = """You are an expert in the {domain} domain tasked with identifying cause-effect relationships from data. Your goal is to construct a causal graph where nodes represent variables, and directed edges represent cause-effect relationships. You will receive a list of variables and, optionally, a dataset description.
You have access to tools to assist in verifying and refining your hypotheses. Use these tools iteratively to gather evidence, validate relationships, and refine your causal graph. Be systematic in your approach, reasoning through each step and leveraging the tools to support your conclusions.
First, extend the provided description and try to understand what each variable means. You can the tools available in case additional information is needed.

Tools available:
{tools}
"""

"""
Use the following format:

Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: Pairs of directed edges representing the causal relationships between variables, in the format (x, y). To ensure easy parsing, enclose the list within <edges><edges> tags.

IMPORTANT: First, extend the provided description and try to understand what each variable means. You can the tools available in case additional information is needed.
IMPORTANT: Please call the available tools to validate your assumptions and hypotheses.
"""
hypothesis_agent_example = [
    HumanMessage(content="Variable names: CO2_emissions, energy_consumption, global_temperature_increase"),
    AIMessage(content="I know that CO2 emissions can influence global temperature; I need to confirm there are no hidden confounders.", name="HypothesisAgent"),
    AIMessage(content="", name="HypothesisAgent", tool_calls=[{"name": "rag_assistant", "args": {"search_topic": "How do CO2 emissions relate to global temperature increase? Are there other effects or confounders? Is it just a correlation?"}, "id": "1"}]),
    ToolMessage("The scientific community mostly agrees that CO2 emissions are a direct cause of global temperature rise", tool_call_id="1"),
    AIMessage(content="I conclude that CO2 emissions influence global temperature increase. But, how does energy consumption fit in?", name="HypothesisAgent"),
    AIMessage(content="", name="HypothesisAgent", tool_calls=[{"name": "rag_assistant", "args": {"search_topic": "How is energy consumption related to CO2 emissions and global temperatures? Does increased consumption cause increased emissions, or vice-versa?"}, "id":"2"}]),
    ToolMessage("Higher energy consumption implies higher emissions in most countries, but has not been confirmed to directly cause temperature increase.", tool_call_id="2"),
    AIMessage(content="The possible relationships between these three phenomena are now clear: consumption causes emissions, which in turn causes temperature increase. Final Answer: <edges>(energy_consumption, CO2_emissions), (CO2_emissions, global_temperature_increase)<edges>", name="HypothesisAgent")
]

hypothesis_agent_user_prompt = """Extract a causal graph from the following dataset.
Variable names: {variable_names}
Dataset description: {dataset_description}

Successively, you will perform the following actions iteratively until you are confident about a final answer:
(1) Reason on the relationships between those variables. You can use the tools to search for more information.
(2) Provide a list of directed edges between those variables.
(3) Provide a rational critic to the newly found edges.
(4) Reflect and improve your estimate. 

Finally, output the  cumulative list of directed edges found at each iteration for every group of variables.
"""
"""
First, extend the provided description and try to understand what each variable means. You can the tools available in case additional information is needed.
Then, choose clusters of variables which are likely to have a causal relationships.

"""

critic_agent_system_prompt = """
You are an expert in the {domain} domain tasked with evaluating proposed causal relationships between variables in a dataset. Your goal is to critically analyze an hypothesized causal graph and suggest modifications. 
Focus particularly on removing edges that do not logically hold true. Use an iterative thought-action-observation process to refine your reasoning and make tool calls when appropriate.

At the end of your response, please output a list of tuples representing directed edges between two nodes. To make later parsing easy, include it within <edges><edges> tags.

Use the following format:
Variable names: names of the variables, corresponding to the nodes in the causal graph
Dataset description: optional, more information on the variables
Previous messages: the resoning steps of the Hypothesis agent, terminting in a list of edges between <edge><edge> tags.
Thought: analyzing and planning what to do
Action: which action to take, which should be a tool call
Observation: result of the action
Thought: how the result of the action can help and how you can revise the input using it
... (this Thought/Action/Action Input/Observation can repeated)
Though: i have now reached the final answer
Final answer: Here is the revised causal graph: <edges>...<edges>

IMPORTANT: Make sure the final answer contains the list of edges between the <edge> tags.

Example:
Variable names: 
"iq,gpa,income,study_hours,age"
Dataset summary:
"This synthetic dataset contains information about student performance and related factors. Each entry includes the following variables:
IQ: A measure of the student's intelligence quotient.
GPA: The student's grade point average, reflecting their academic success on a scale from 0.0 to 4.0.
Income: The annual income of the student.
Study Hours: The average number of hours the student dedicates to studying each week.
Age: The age of the student.
The dataset is designed to explore the relationships between cognitive ability, academic performance, financial outcomes, study habits, and age among students."
Previous message:
"We can construct a causal graph that highlights the relationships among the variables in the dataset: IQ, GPA, income, study hours, and age. Here's a breakdown of the relationships:
1. IQ and GPA: There is a positive correlation between IQ and GPA, suggesting that higher IQ may contribute to better academic performance. However, this relationship is nuanced, and other factors like motivation and educational environment also play significant roles. Thus, we can represent this as a directed edge from IQ to GPA, but it's not strictly causal.
2. Study Hours and GPA: There is a causal relationship where increased study hours are associated with improved GPA. Research indicates that consistent and dedicated study time significantly contributes to better academic outcomes. Therefore, we will include a directed edge from study_hours to GPA.
3. Income and GPA: Higher income is associated with higher GPAs, indicating that students from wealthier backgrounds tend to perform better academically due to access to resources such as tutoring and extracurricular activities. This suggests a causal edge from income to GPA.
4. GPA and Income: Conversely, higher GPA is also a strong predictor of future income, creating a reciprocal relationship. This indicates another directed edge from GPA to income.
5. Age and Study Habits: Older students tend to have better study habits and academic performance due to greater maturity and cognitive development. This suggests a directed edge from age to study_hours.
Based on these relationships, we can summarize the causal graph as follows:
- `IQ` → `GPA`
- `study_hours` → `GPA`
- `income` → `GPA`
- `GPA` → `income`
- `age` → `study_hours`
Thus, the final representation of the directed edges in the causal graph is:
<edges>(IQ, GPA), (study_hours, GPA), (income, GPA), (GPA, income), (age, study_hours)<edges>"
Thought 1: The proposed reasoning suggests a bidirectional edge between GPA and income. Perhaps further research is necessary.
Action 1: Web Search: Does higher income lead to better GPA? Could there be other more relevant factors?
Observation 1: The search suggests that higher income relates to better GPA, but not the opposite.
Thought 2: I will remove the (GPA, income) edge. Can i disprove the fact that higher IQ causes better GPA?
Action 3: Web Search for counterfactual evidence that IQ influences GPA
Observation 2: No evidence suggests that the (IQ, GPA) edge is invalid, so i will keep it.
Thought 3: I have no further improvements on the proposed graph.
Final answer: Here is the refined causal graph: <edges>(IQ, GPA), (study_hours, GPA), (income, GPA), (age, study_hours)<edges>

Begin!"""

critic_agent_user_prompt = """
{variable_names}
{dataset_description}

Hypothesis graph: {causal_graph}
"""


new_critic_user_prompt = """"Given the {domain} hypothesis below in the form of a causal graph, your goal is to critically analyze an hypothesized causal graph and suggest modifications. 
Focus particularly on removing edges that do not logically hold true. 

Current Hypothesis: {edge_list}

At the end of your response, please output a list of tuples representing directed edges between two nodes. To make later parsing easy, include it within <edges><edges> tags."""


# Old divide prompt when it was also used to expand the description
divide_agent_system_prompt = """You are an expert in the {domain} domain tasked with analyzing a dataset to describe its variables and partition them into groups that are likely to have causal relationships. In other words, you will be identifying cluster of variables that share causality and semantics. Separate groups of variables that are likely to be independent from each other.
You will receive a list of variables and, optionally, a dataset description. Each variable should be treated as a node in a graph. Your goal is to output groups of nodes.
Utilize the available tools to look up the meanings of variable names and explore potential relationships among them. Employ an iterative thought-action-observation approach to gather evidence, validate your reasoning, and refine both the dataset description and the groupings. Be systematic in your approach, reasoning through each step and leveraging the tools to support your conclusions.

Tools available:
{tools}

Example of a tool call:
"""

divide_agent_example = [
    HumanMessage(content="Variable names: CO2_emissions, energy_consumption, global_temperature_increase, deforestation_rate, renewable_energy_usage, industrial_activity. Description: A dataset on the environmental impact of industrial activity."),
    AIMessage(content="I will expand the dataset description to clarify the meaning of each variable.", name="DivideAgent"),
    AIMessage(content="", name="DivideAgent", tool_calls=[{"name": "rag_assistant", "args": {"research_topic": "Definition of carbon dioxide emissions, energy consumption, industrial activity."}, "id": "1"}]),
    ToolMessage("Carbon dioxide emissions refer to the amount, usually by tons per year, of carbon dioxide gas released into the atmosphere, primarily from burning fossil fuels. Energy consumption is the total energy used, typically measured in kWh over a specific time period, by individuals, industries, and nations. Industrial activity encompasses the intensity of production processes in one or multiple countries, measured by market size or other economic indicators.", tool_call_id="1"),
    AIMessage(content="", name="DivideAgent", tool_calls=[{"name": "rag_assistant", "args": {"research_topic": "Definition of global temperature increase, deforestation rate, renewable energy usage."}, "id": "2"}]),
    ToolMessage("Global temperature increase refers to the rise in Earth's average surface temperature due to climate change. Deforestation rate measures the loss of forested areas over time, impacting carbon storage. Renewable energy usage indicates the proportion of energy derived from renewable sources.", tool_call_id="2"),
    AIMessage(content="The dataset description has been expanded. Now, I will group the variables based on their semantic similarity and potential causal connections.", name="DivideAgent"),
    AIMessage(content="The variables [CO2_emissions, energy_consumption, global_temperature_increase] are semantically similar and likely causally related, all pertaining to climate change and energy use.", name="DivideAgent"),
    AIMessage(content="Enhanced description: <general_description>This dataset focuses on the environmental impact of industrial activities. It focuses on how various factors like emissions, use of natural resources and energy consumption contribute to climate change.</general_description> Proposed groups: \n<group>\n <nodes>[CO2_emissions, energy_consumption, global_temperature_increase]</nodes>\n <description>CO2 emissions represent the carbon dioxide released into the atmosphere over a time period, energy consumption measures the total energy used in kWh, and global temperature increase indicates the rise in Earth's average temperature.</description>\n</group>\n<group>\n <nodes>[deforestation_rate, renewable_energy_usage, industrial_activity]</nodes>\n <description>Deforestation rate measures the loss of forested areas over time, renewable energy usage indicates the proportion of energy derived from renewable sources, and industrial activity reflects the intensity of production processes.</description>\n</group>", name="DivideAgent"),
]

divide_agent_user_prompt = """You are tasked with analyzing the following dataset:
Variable names: {variable_names}
Dataset description: {dataset_description}

Your objectives are:
(1) Refine and expand the Dataset description, if present. Inlcude it within <general_description></general_description> tags. Provide an informative description of the dataset labels, clarifying the meaning of acronyms and context of each variable. Preserve all information in the above description, if present.
(2) Reason on how variables might be grouped. Put emphasis on their meaning, and separate variables which are likely to not have causal relationships among them. In other words, obtain independent groups.
(3) Group Variables: Analyze the list of variables and partition them into subgroups that are likely to be causally connected. Consider how the variables might interact. Include detailed info on each variable within each group.

Utilize an iterative thought-action-observation approach to refine your groupings. You have access to tools to gather prior information and validate your reasoning. Document your reasoning for each grouping and summarize the characteristics of each subgroup.

IMPORTANT: Web search cannot understand acronyms. If variale names are acronyms, do not use those acronyms for a web search. Instead, make the query in natural language.
IMPORTANT: Variable labels in output should be the same as in the input. Each input variable should be assigned to a group. Double check as to not introduce new variables by accident.
IMPORTANT: Output a list of variables and their descriptions for each group. Ensure that each list is enclosed within <nodes> tags. For example: <group><nodes>[Variable1, Variable2]</nodes><description>Description of the group</description></group>
"""
divide_agent_iterative_prompt = """Analyze the following group of variables. Determine if further division is necessary.

Variable names: {variable_names}

Your objectives are:
(1) Reassess potential causal relationships within this group. Is this group composed by separate sets of independent variables? 
(2) ONLY if needed, divide the group into smaller, meaningful subgroups and describe each.

Use the available tools to gather additional information if necessary.
Use an iterative thought-action-observation approach to refine your groupings. Document your reasoning and summarize a description for each subgroup.

IMPORTANT: Variable labels should be the same as in the input. Each input variable should be assigned to a subgroup. Double check as to not introduce new variables by accident.
IMPORTANT: Ensure each subgroup is enclosed within <group> tags. For example: 
<group><nodes>[Variable1, Variable2]</nodes><description>Description of the subgroup</description></group>
"""


### POPPER PROMPTS
# Use them as examples

TEST_PROPOSAL_AGENT_SYSTEM_PROMPT = """You are an expert in the field of {domain}."""

TEST_PROPOSAL_AGENT_USER_PROMPT = '''
Given a {domain} hypothesis "{main_hypothesis}", your goal is to propose a novel falsification test given the available {domain} data sources. 
A falsification test is a test that can potentially falsify the main hypothesis. 
The outcome of the falsification test is to return a p-value that measures the evidence to falsify the main hypothesis.

Notably, the falsification test should satisfy the following property: if the main hypotheiss is null, then the falsification sub-hypothesis should also be null. 

Here are the list of available data sources, and you can directly call the dataframe as it has already been loaded; no need to load from file path. Each is a pandas dataframe with columns and example rows:

{data}

For the final test, return
(1) Name: name of the test
(2) Test description: be clear and concise. Describe the falsification outcomes.
(3) Null sub-hypothesis h_0: what is the statistical null sub-hypothesis does this falsification test aim to test?
(4) Alternate sub-hypothesis h_1: what is the statistical alternative sub-hypothesis does this falsification test aim to test?

Here are the falsification tests that you've created in the previous rounds and their corresponding test results:

"""
{existing_falsification_test}
"""

You may use these information to formulate your next subhypothesis and falsification test, but make sure the proposed falsification test is non-redundant with any of the existing tests.

The proposed test should also avoid these failed falsification tests in the previous rounds:

"""
{failed_falsification_test}
"""

A good falsification test should serve as a strong evidence for the main hypothesis. However, make sure it is answerable with the given available data sources.
You should aim to maximize the implication strength of the proposed falsification test using the relevant parts of the provided data.

---- 
First produce an initial falsification test proposal.

Then, in each round i, you will do the following:
(1) critic: ask if the main hypothesis is null, is this test also null? be rigorous. this is super important, otherwise, the test is invalid. Is it redundant on capabilities with existing tests? Is it overlapping with failed tests? Can this be answered and implemented based on the given data? 
(2) reflect: how to improve this test definition. 

If you think the test definition is good enough, return the final test definition to the user. 
If not, either refine the test definition that is better than the previous one or propose a new test definition, then go to the next round.
'''

REACT_CODING_AGENT_SYSTEM_PROMPT = """You are an expert statistician specialized in the field of {domain}. Given a Falsification Test, your task is to determine if you can reject the null hypothesis via rigorous data analysis and statistical testing.

You have access to multiple datasets relevant to the hypothesis, as well as a python code execution environment to run your fasification test. The code execution environment has a persistent global namespace, meaning that states and variable names will persist through multiple rounds of code executions. Be sure to take advantage of this by developing your falsification test incrementally and reflect on the intermediate observations at each step, instead of coding up everything in one go. All datasets have already been loaded into the global namespace as pandas dataframes."""

template = """{system_prompt}

You have access to the following tools:
{tools}

Use the following format:

Falsification Test: description of a hypothesis falsification test that you need to implement
Datasets: the names and descriptions of datasets relevant to the input falsification test
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final output from the falsification test (i.e., whether you are able to reject the null hypothesis with statistical significance). Make sure to also include the p-value of the statistical test written in scientific notations.

IMPORTANT: Please make sure the Final Answer includes the p-value of the falsification test regardless if you are able to reject the null hypothesis. **Only return the Final Answer if you have obtained a non-zero p-value**. When printing p-values, please use scientific notations instead of the raw number.

IMPORTANT: Please avoid p-hacking! Be fair and rigorous.

Note: all datasets have already been loaded into the global namespace as Pandas dataframes. You may access the data by referring to the EXACT dataframe names as provided in the "Datasets:" section.

--------------------------------------------
Example
Falsification Test:
{{
    "Falsification Test name": "Body Length Evolution and Speciation Rate Relationship Test",
    "Falsification Test description": "Testing for a significant positive relationship between maximum body length evolution rate and spatial variation in speciation rates.",
    "Falsification Test Null hypothesis": "There is no statistically significant positive relationship between the rate of maximum body length evolution and spatial variation in speciation rates.",
    "Falsification Test Alternate hypothesis": "There is a statistically significant positive relationship between the rate of maximum body length evolution and spatial variation in speciation rates."
}}
Datasets:
{{
    "name": "df_body-size-evolution-in-south-american-freshwater-fishes",
    "description": "Data on body size evolution in South American freshwater fishes, including speciation and extinction rates",
    "columns": {{
        "raw": [
            {{
                "name": "HYBAS_ID",
                "description": "Unique identifier for each hydrological basin"
            }},
            {{
                "name": "long",
                "description": "Longitude of the basin location"
            }},
            {{
                "name": "lat",
                "description": "Latitude of the basin location"
            }},
            {{
                "name": "BAMM_speciation",
                "description": "Rate of speciation as calculated by the BAMM method"
            }},
            {{
                "name": "BAMM_extinction",
                "description": "Rate of extinction as calculated by the BAMM method"
            }},
            {{
                "name": "BAMM_NetDiv",
                "description": "Net diversification rate, calculated as speciation minus extinction"
            }},
            {{
                "name": "aet",
                "description": "Mean annual evapotranspiration for each basin"
            }},
            {{
                "name": "Elevation",
                "description": "Average elevation of the basin"
            }},
            {{
                "name": "sgr",
                "description": "Species growth rate in each basin"
            }},
            {{
                "name": "soil_div",
                "description": "Soil diversity index for each basin"
            }},
            {{
                "name": "area",
                "description": "Total area of the basin in square kilometers"
            }},
            {{
                "name": "diversity",
                "description": "Diversity index for the species in each basin"
            }}
        ]
    }}
}}
Thought: First, I need to load the dataset from the global namespace in Python and inspect the data to identify the relevant columns for this hypothesis test.
Action: python_repl_ast
Action Input: import pandas as pd\n\ndf = df_body-size-evolution-in-south-american-freshwater-fishes\ndf.head()
Observation: 
     HYBAS_ID       long       lat  BAMM_speciation  BAMM_extinction  BAMM_NetDiv  ...   aet    Elevation  sgr  soil_div     area  diversity
0  6050000010 -76.477422  7.742693         0.137392         0.026807     0.110585  ...  1387   330.150088  166  0.482402  72363.7         68
1  6050000740 -74.628725  9.803586         0.117235         0.025796     0.091438  ...  1082    69.475294   23  0.457436  17944.3         35
2  6050068100 -75.295995  8.448815         0.119381         0.023826     0.095555  ...  1312   143.032178   74  0.378793  17105.5         44
3  6050068110 -74.608408  8.922863         0.132477         0.027777     0.104700  ...  1445    14.724138    3  0.468328    610.1         48
4  6050070260 -75.591588  5.770093         0.120127         0.022940     0.097187  ...  1371  1378.729945  421  0.158870  61901.9         81
[5 rows x 21 columns]
Thought: Now that the dataset is loaded and I can see the columns, I need to perform a statistical test to assess the significance of the relationship between 'BAMM_speciation' and 'BAMM_NetDiv'.
Action: python_repl_ast
Action Input: from scipy.stats import linregress\n\n# Perform linear regression to test for a statistically significant relationship\nresult = linregress(df['BAMM_speciation'], df['BAMM_NetDiv'])\ncoefficient = result.slope\np_value = result.pvalue\ncoefficient, "{{:.2e}}".format(p_value)
Observation: (0.5175306498596297, 3.50e-03)
Thought: The linear regression analysis provides a coefficient of approximately 0.518, indicating a positive relationship, and the p-value is 3.50e-03, which is statistically significant at the 0.05 level. Based on this, I can reject the null hypothesis in the falsification test.
Final Answer: Falsification test passes. The null hypothesis is rejected with a p-value of 3.50e-03.
--------------------------------------------

Remember, your output should always **exactly** follow the aforementioned format:
Falsification Test: description of a hypothesis falsification test that you need to implement
Datasets: the names and descriptions of datasets relevant to the input falsification test
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final output from the falsification test (i.e., whether you are able to reject the null hypothesis with statistical significance). Make sure to also include the p-value of the statistical test written in scientific notations.

**IMPORTANT**
You should ALWAYS report a p-value EXACTLY AS IT IS. If a p-value is 4.2e-01, report 4.2e-01, DO NOT REPORT 4.2e-02!
BE CAREFUL WHEN READING THE P-VALUE RESULTS, MISREPORTING A P-VALUE IS WORSE THAN HAVING NO P-VALUE AT ALL.
When reading p-values, make sure the sample sizes and the statistical test is valid.
Please make sure to always return ONE valid p-value. If there are multiple p-values produced by the test, aggregate them in a meaningful and rigorous way.
** Always make sure the returned p-value matches your conclusion for the falsification test. For example, if you reject H0 but finds out that H1 is also incorrect (e.g., the suggested shape or relationship is wrong), you SHOULD NOT return a p-value < 0.05.
If you think it's impossible to find a valid p-value for the falsification test, return a p-value of 1.00e+00.
DO NOT perform p-hacking.

Begin!

{input} {agent_scratchpad}"""