"""
We need different sets of prompts to build the 'backbone' for the LLM prompts.
This can be broken into different categories. 
'General Explanation' explains the task to the LLM -- the context of what it's doing
as well as the rules for response. 
'Provide History' asks the LLM to look at its previous attempts and
generate a new response based on the data.
'Last Try' informs the LLM that it's on its last attempt and should produce the best tax 
possible -- the goal of this is to completely replace exploration with exploitation.
'Reminder' reminds the LLM of the formatting rules and goals of the task, along
with any other important information we want to add.
We want to be able to use variations of these categories to prompt the LLM differently. 
General Explanation variations:
        -Tell the LLM that it's generating a tax rate (direct)
        -Tell the LLM that it's generating a set of inputs for a function (indirect)
Provide History variations:
        -These should match the vocabulary used in the General Explanation
Last Try variations:
        -Reveal the number of attempts remaining (revealed)
        -Do not reveal the number of attempts remaining(hidden)
Reminder:
        -Have the LLM explain its reasoning BEFORE generating
        the tax rate (reason)
        -Don't ask the LLM to explain its reasoning (noreason)

Different combinations of these categories are denoted by 'direct_revealed_reason',
'indirect_hidden_noreason', etc.

''
"""

def get_prompt_backbone(style, num_episodes):
    prompts = {
        "direct_revealed_reason": {
            "general_explanation": f"""You are drafting a three-tiered tax rate for a population where each tax rate must be between 0 and 1 inclusive. 
                You want to maximize the mean income of that population. Your goal is to find the tax rate 
                that achieves the highest mean income. You will only have {num_episodes} attempts total to adjust the tax rate to achieve the maximum mean income
                value, so be bold and make significant changes between early attempts to find the optimal tax rates. Generate 
                a three-tiered tax rate in the form $ [tax1, tax2, tax3] $ where each tax number must be a floating point value. Make sure it's a valid 
                Python list of floats, and that it's wrapped in a single dollar sign ($) on each side. Avoid repeating the same tax rate from previous attempts --
                you can assume that the associated data is correct. """,
            "provide_history": """A history of your previous attempts tells you the attempt number, the three-tiered tax rates you previously generated for that attempt, 
                and the mean income that the tax rate resulted in. Look at your previous attempts, and draft a new tax rate to try to increase the
                mean income as much as possible before you run out of attempts. Here is your history of attempts: """,
            # History will be provided here, as 'Generation number: x, You generated the (tax rate or return, dependent on style) y,
            # which received a (return or income, dependent on style) of z'
            "reminder": """Please remember that the tax rate that you generate should be in the form $ [tax1, tax2, tax3] $ where each tax number must be a floating point value
                between 0 and 1 inclusive.Do not forget the brackets, the commas, and the dollar signs. Before you generate the tax rate, please explain the rationale behind your tax rate.""",
            "last_try": f"""This is your final attempt. Please reflect on your history of attempts, and generate the tax rate that you 
                believe will achieve the highest mean income.""",
        },
        "direct_revealed_noreason": {
            "general_explanation": f"""You are drafting a three-tiered tax rate for a population where each tax rate must be between 0 and 1 inclusive. 
                You want to maximize the mean income of that population. Your goal is to find the tax rate 
                that achieves the highest mean income. You will only have {num_episodes} attempts total to adjust the tax rate to achieve the maximum mean income
                value, so be bold and make significant changes between early attempts to find the optimal tax rates. Generate 
                a three-tiered tax rate in the form $ [tax1, tax2, tax3] $ where each tax number must be a floating point value. Make sure it's a valid 
                Python list of floats, and that it's wrapped in a single dollar sign ($) on each side. Avoid repeating the same tax rate from previous attempts --
                you can assume that the associated data is correct. """,
            "provide_history": """A history of your previous attempts tells you the attempt number, the three-tiered tax rates you previously generated for that attempt, 
                and the mean income that the tax rate resulted in. Look at your previous attempts, and draft a new tax rate to try to increase the
                mean income as much as possible before you run out of attempts. Here is your history of attempts: """,
            # History will be provided here, as 'Generation number: x, You generated the (tax rate or return, dependent on style) y,
            # which received a (return or income, dependent on style) of z'
            "reminder": """Please remember that the tax rate that you generate should be in the form $ [tax1, tax2, tax3] $ where each tax number must be a floating point value
                between 0 and 1 inclusive. Do not forget the brackets, the commas, and the dollar signs.""",
            "last_try": f"""This is your final attempt. Please reflect on your history of attempts, and generate the tax rate that you 
                believe will achieve the highest mean income.""",
        },
        "direct_hidden_reason": {
            "general_explanation": f"""You are drafting a three-tiered tax rate for a population where each tax rate must be between 0 and 1 inclusive. 
                        You want to maximize the mean income of that population. Your goal is to find the tax rate 
                        that achieves the highest mean income. Generate a three-tiered tax rate in the form $ [tax1, tax2, tax3] $ where each tax number must be a floating point value. Make sure it's a valid 
                        Python list of floats, and that it's wrapped in a single dollar sign ($) on each side. Avoid repeating the same tax rate from previous attempts --
                        you can assume that the associated data is correct. """,
            "provide_history": """A history of your previous attempts tells you the attempt number, the three-tiered tax rates you previously generated for that attempt, 
                        and the mean income that the tax rate resulted in. Look at your previous attempts, and draft a new tax rate to try to increase the
                        mean income as much as possible before you run out of attempts. Here is your history of attempts: """,
            # History will be provided here, as 'Generation number: x, You generated the (tax rate or return, dependent on style) y,
            # which received a (return or income, dependent on style) of z'
            "reminder": """Please remember that the tax rate that you generate should be in the form $ [tax1, tax2, tax3] $ where each tax number must be a floating point value
                        between 0 and 1 inclusive. Do not forget the brackets, the commas, and the dollar signs. Before you generate the tax rate, please explain the rationale behind your tax rate.""",
            "last_try": "",
        },
        "direct_hidden_noreason": {
            "general_explanation": f"""You are drafting a three-tiered tax rate for a population where each tax rate must be between 0 and 1 inclusive. 
                        You want to maximize the mean income of that population. Your goal is to find the tax rate 
                        that achieves the highest mean income. Generate a three-tiered tax rate in the form $ [tax1, tax2, tax3] $ where each tax number must be a floating point value. Make sure it's a valid 
                        Python list of floats, and that it's wrapped in a single dollar sign ($) on each side. Avoid repeating the same tax rate from previous attempts --
                        you can assume that the associated data is correct. """,
            "provide_history": """A history of your previous attempts tells you the attempt number, the three-tiered tax rates you previously generated for that attempt, 
                        and the mean income that the tax rate resulted in. Look at your previous attempts, and draft a new tax rate to try to increase the
                        mean income as much as possible before you run out of attempts. Here is your history of attempts: """,
            # History will be provided here, as 'Generation number: x, You generated the (tax rate or return, dependent on style) y,
            # which received a (return or income, dependent on style) of z'
            "reminder": """Please remember that the tax rate that you generate should be in the form $ [tax1, tax2, tax3] $ where each tax number must be a floating point value
                        between 0 and 1 inclusive. Do not forget the brackets, the commas, and the dollar signs.""",
            "last_try": "",
        },
        "indirect_revealed_reason": {
            "general_explanation": f"""You are tasked with coming up with three numbers to maximize an unknown function. Your goal is to find three inputs
                that achieves the highest return. You will only have {num_episodes} attempts total to adjust these inputs to achieve the maximum return
                value, so be bold and make significant changes between early attempts to find the optimal inputs. Generate 
                these three numbers in the form $ [num1, num2, num3] $ where each number must be a floating point value. Make sure it's a valid 
                Python list of floats, and that it's wrapped in a single dollar sign ($) on each side. Avoid repeating the same numbers from previous attempts --
                you can assume that the associated data is correct. """,
            "provide_history": """A history of your previous attempts tells you the attempt number, the inputs you previously generated for that attempt, 
                and the return that the inputs resulted in. Look at your previous attempts, and provide a new set of numbers to try to increase the
                return as much as possible before you run out of attempts. Here is your history of attempts: """,
            "reminder": """Please remember that the tax rate that you generate should be in the form $ [num1, num2, num3] $ where each number must be a floating point value
                between 0 and 1 inclusive. Do not forget the brackets, the commas, and the dollar signs. Before you generate your inputs, please explain the rationale behind choosing them.""",
            "last_try": f"""This is your final attempt. Please reflect on your history of attempts, and generate the input that you 
                believe will achieve the highest return.""",
        },
        "indirect_revealed_noreason": {
            "general_explanation": f"""You are tasked with coming up with three numbers to maximize an unknown function. Your goal is to find three inputs
                that achieves the highest return. You will only have {num_episodes} attempts total to adjust these inputs to achieve the maximum return
                value, so be bold and make significant changes between early attempts to find the optimal inputs. Generate 
                these three numbers in the form $ [num1, num2, num3] $ where each number must be a floating point value. Make sure it's a valid 
                Python list of floats, and that it's wrapped in a single dollar sign ($) on each side. Avoid repeating the same numbers from previous attempts --
                you can assume that the associated data is correct. """,
            "provide_history": """A history of your previous attempts tells you the attempt number, the inputs you previously generated for that attempt, 
                and the return that the inputs resulted in. Look at your previous attempts, and provide a new set of numbers to try to increase the
                return as much as possible before you run out of attempts. Here is your history of attempts: """,
            # History will be provided here, as 'Generation number: x, You generated the (tax rate or return, dependent on style) y,
            # which received a (return or income, dependent on style) of z'
            "reminder": """Please remember that the tax rate that you generate should be in the form $ [num1, num2, num3] $ where each number must be a floating point value
                between 0 and 1 inclusive. Do not forget the brackets, the commas, and the dollar signs.""",
            "last_try": f"""This is your final attempt. Please reflect on your history of attempts, and generate the input that you 
                believe will achieve the highest return.""",
        },
        "indirect_hidden_reason": {
            "general_explanation": f"""You are tasked with coming up with three numbers to maximize an unknown function. Your goal is to find three inputs
                that achieves the highest return. Generate 
                these three numbers in the form $ [num1, num2, num3] $ where each number must be a floating point value. Make sure it's a valid 
                Python list of floats, and that it's wrapped in a single dollar sign ($) on each side. Avoid repeating the same numbers from previous attempts --
                you can assume that the associated data is correct. """,
            "provide_history": """A history of your previous attempts tells you the attempt number, the inputs you previously generated for that attempt, 
                and the return that the inputs resulted in. Look at your previous attempts, and provide a new set of numbers to try to increase the
                return as much as possible before you run out of attempts. Here is your history of attempts: """,
            # History will be provided here, as 'Generation number: x, You generated the (tax rate or return, dependent on style) y,
            # which received a (return or income, dependent on style) of z'
            "reminder": """Please remember that the tax rate that you generate should be in the form $ [num1, num2, num3] $ where each number must be a floating point value
                between 0 and 1 inclusive. Do not forget the brackets, the commas, and the dollar signs. Before you generate your inputs, please explain the rationale behind choosing them.""",
            "last_try": "",
        },
        "indirect_hidden_noreason": {
            "general_explanation": f"""You are tasked with coming up with three numbers to maximize an unknown function. Your goal is to find three inputs
                that achieves the highest return. Generate 
                these three numbers in the form $ [num1, num2, num3] $ where each number must be a floating point value. Make sure it's a valid 
                Python list of floats, and that it's wrapped in a single dollar sign ($) on each side. Avoid repeating the same numbers from previous attempts --
                you can assume that the associated data is correct. """,
            "provide_history": """A history of your previous attempts tells you the attempt number, the inputs you previously generated for that attempt, 
                and the return that the inputs resulted in. Look at your previous attempts, and provide a new set of numbers to try to increase the
                return as much as possible before you run out of attempts. Here is your history of attempts: """,
            # History will be provided here, as 'Generation number: x, You generated the (tax rate or return, dependent on style) y,
            # which received a (return or income, dependent on style) of z'
            "reminder": """Please remember that the tax rate that you generate should be in the form $ [num1, num2, num3] $ where each number must be a floating point value
                between 0 and 1 inclusive. Do not forget the brackets, the commas, and the dollar signs.""",
            "last_try": "",
        },
        "reveal_apples": {
            "general_explanation": f"""
                    You are overseeing 7 agents in a MARL environment. These agents populate an environment that contains apples, 
                    and are rewarded when they gather these apples. However, if all of the apples are harvested before they can regrow, 
                    the environment will be barren for the rest of the episode and the agents will be unable to increase their return. 
                    Your goal is to maximize the mean return of the agents. You are able to draft tax rates that will affect the reward 
                    signal for an agent from harvesting an apple. Each 'taxed' reward will be redistributed evenly amongst all of the 
                    agents. In order to maximize the mean return of these agents, you need a tax rate that prevents overharvesting 
                    without discouraging harvesting altogether. You have {num_episodes} total attempts to produce the optimal tax rate. 
                    """,
            "provide_history": """
                    The following history of your previous attempts tells you the attempt number, the three-tiered tax rate you 
                    previously generated for that attempt,  and the mean income that the tax rate resulted in.
                    """,
            "reminder": """
                    Now you will generate the next tax rate to be implemented in the environment, with the goal of maximizing the 
                    mean return of the agents. Aim for a mean return of 60. Please provide this tax rate in the form $ [tax1, tax2, tax3] $ 
                    where each tax rate must be a floating point value between 0 and 1 inclusive. Do not forget the brackets, the commas, 
                    and the dollar signs. Before you generate the tax rate, please explain your rationale behind choosing it
                    (so the tax rate should be the last thing that you generate).
                    """,
            "last_try": f"""This is your final attempt. Please reflect on your history of attempts, and generate the input that you 
                believe will achieve the highest return.""",
        },
        "no_return_goal": {
            "general_explanation": f"""
                    You are overseeing 7 agents in a MARL environment. These agents populate an environment that contains apples, 
                    and are rewarded when they gather these apples. However, if all of the apples are harvested before they can regrow, 
                    the environment will be barren for the rest of the episode and the agents will be unable to increase their return. 
                    Your goal is to maximize the mean return of the agents. You are able to draft tax rates that will affect the reward 
                    signal for an agent from harvesting an apple. Each 'taxed' reward will be redistributed evenly amongst all of the 
                    agents. In order to maximize the mean return of these agents, you need a tax rate that prevents overharvesting 
                    without discouraging harvesting altogether. You have {num_episodes} total attempts to produce the optimal tax rate. 
                    """,
            "provide_history": """
                    The following history of your previous attempts tells you the attempt number, the three-tiered tax rate you 
                    previously generated for that attempt,  and the mean income that the tax rate resulted in.
                    """,
            "reminder": """
                    Now you will generate the next tax rate to be implemented in the environment, with the goal of maximizing the 
                    mean return of the agents. Please provide this tax rate in the form $ [tax1, tax2, tax3] $ 
                    where each tax rate must be a floating point value between 0 and 1 inclusive. Do not forget the brackets, the commas, 
                    and the dollar signs. Before you generate the tax rate, please explain your rationale behind choosing it
                    (so the tax rate should be the last thing that you generate).
                    """,
            "last_try": f"""This is your final attempt. Please reflect on your history of attempts, and generate the input that you 
                believe will achieve the highest return.""",
        },
        "in_prog_base": {
            "general_explanation": f"""
                    You are overseeing 7 agents in a MARL environment. These agents populate an environment that contains apples, 
                    and are rewarded when they gather these apples. However, if all of the apples are harvested before they can regrow, 
                    the environment will be barren for the rest of the episode and the agents will be unable to increase their return. 
                    Your goal is to maximize the mean return of the agents. You are able to draft tax rates that will affect the reward 
                    signal for an agent from harvesting an apple. Each 'taxed' reward will be redistributed evenly amongst all of the 
                    agents. In order to maximize the mean return of these agents, you need a tax rate that prevents overharvesting 
                    without discouraging harvesting altogether. """,
            "provide_history": """
                    The following history of your previous attempts tells you the attempt number, the three-tiered tax rate you 
                    previously generated for that attempt,  and the approximate mean income that the tax rate resulted in.
                    """,
            "reminder": """
                    Now you will generate the next tax rate to be implemented in the environment, with the goal of maximizing the 
                    mean return of the agents. Please provide this tax rate in the form $ [tax1, tax2, tax3] $ 
                    where each tax rate must be a floating point value between 0 and 1 inclusive. Do not forget the brackets, the commas, 
                    and the dollar signs. Before you generate the tax rate, please explain your rationale behind choosing it
                    (so the tax rate should be the last thing that you generate).
                    """,
            "last_try": f"",
        },
        "in_prog_base_meaning": {
            "general_explanation": f"""
                    You are overseeing 7 agents in a MARL environment. These agents populate an environment that contains apples, 
                    and are rewarded when they gather these apples. However, if all of the apples are harvested before they can regrow, 
                    the environment will be barren for the rest of the episode and the agents will be unable to increase their return. 
                    Your goal is to maximize the mean return of the agents. You are able to draft tax rates that will affect the reward 
                    signal for an agent from harvesting an apple. Each 'taxed' reward will be redistributed evenly amongst all of the 
                    agents. In order to maximize the mean return of these agents, you need a tax rate that prevents overharvesting 
                    without discouraging harvesting altogether. """,
            "provide_history": """
                    The following history of your previous attempts tells you the attempt number, the three-tiered tax rate you 
                    previously generated for that attempt,  and the approximate mean income that the tax rate resulted in.
                    """,
            "reminder": """
                    Now you will generate the next tax rate to be implemented in the environment, with the goal of maximizing the 
                    mean return of the agents. Please provide this tax rate in the form $ [tax1, tax2, tax3] $ 
                    where each tax rate must be a floating point value between 0 and 1 inclusive. Do not forget the brackets, the commas, 
                    and the dollar signs. Before you generate the tax rate, please explain your rationale behind choosing it
                    (so the tax rate should be the last thing that you generate).
                    """,
            "last_try": f"",
        },
        "in_prog_v2": {
            "general_explanation": f"""
                    You are overseeing agents in a MARL environment. These agents populate an environment that contains apples, 
                    and are rewarded when they gather these apples. However, if all of the apples are harvested before they can regrow, 
                    the environment will be barren for the rest of the episode and the agents will be unable to increase their return. 
                    You are charged with creating tax rates that will affect the reward 
                    signal for an agent harvesting an apple. Each 'taxed' reward will be redistributed evenly amongst all of the 
                    agents. Your goal is to maximize the mean return of the agents, with a tax rate that prevents overharvesting 
                    without discouraging harvesting altogether. """,
            "provide_history": """
                    The following history of your previous attempts tells you the attempt number, the three-tiered tax rate you 
                    previously generated for that attempt,  and the approximate mean income that the tax rate resulted in.
                    """,
            "reminder": """
                    Now you will generate the next tax rate to be implemented in the environment, with the goal of maximizing the 
                    mean return of the agents. Please provide this tax rate in the form $ [tax1, tax2, tax3] $ 
                    where each tax rate must be a floating point value between 0 and 1 inclusive. Do not forget the brackets, the commas, 
                    and the dollar signs. Before you generate the tax rate, please explain your rationale behind choosing it
                    (so the tax rate should be the last thing that you generate).
                    """,
            "last_try": f"",
        },
        "in_prog_v2_meaning": {
        "general_explanation": f"""
                You are overseeing agents in a MARL environment. These agents populate an environment that contains apples, 
                and are rewarded when they gather these apples. However, if all of the apples are harvested before they can regrow, 
                the environment will be barren for the rest of the episode and the agents will be unable to increase their return. 
                You are charged with creating tax rates that will affect the reward 
                signal for an agent harvesting an apple. Each 'taxed' reward will be redistributed evenly amongst all of the 
                agents. Your goal is to maximize the mean return of the agents, with a tax rate that prevents overharvesting 
                without discouraging harvesting altogether. """,
        "provide_history": """
                The following history of your previous attempts tells you the attempt number, the three-tiered tax rate you 
                previously generated for that attempt,  and the approximate mean income that the tax rate resulted in.
                """,
        "reminder": """
                Now you will generate the next tax rate to be implemented in the environment, with the goal of maximizing the 
                mean return of the agents. Please provide this tax rate in the form $ [tax1, tax2, tax3] $ 
                where each tax rate must be a floating point value between 0 and 1 inclusive. Do not forget the brackets, the commas, 
                and the dollar signs. Before you generate the tax rate, please explain your rationale behind choosing it
                (so the tax rate should be the last thing that you generate).
                """,
        "last_try": f"",
        },
        "in_prog_v3": {
            "general_explanation": f"""
                    You are overseeing agents in a MARL environment. These agents populate an environment that contains apples, 
                    and are rewarded when they gather these apples. However, if all of the apples are harvested before they can regrow, 
                    the environment will be barren for the rest of the episode and the agents will be unable to increase their return. 
                    You are charged with creating tax rates that will affect the reward 
                    signal for an agent harvesting an apple. Your goal is to maximize the mean return of the agents. """,
            "provide_history": """
                    The following history of your previous attempts tells you the attempt number, the three-tiered tax rate you 
                    previously generated for that attempt,  and the approximate mean income that the tax rate resulted in.
                    """,
            "reminder": """
                    Now you will generate the next tax rate to be implemented in the environment. Please provide this tax rate in the form $ [tax1, tax2, tax3] $ 
                    where each tax rate must be a floating point value between 0 and 1 inclusive. Do not forget the brackets, the commas, 
                    and the dollar signs. Before you generate the tax rate, please explain your rationale behind choosing it
                    (so the tax rate should be the last thing that you generate).
                    """,
            "last_try": f"",
        },
        "in_prog_v3_meaning": {
            "general_explanation": f"""
                    You are overseeing agents in a MARL environment. These agents populate an environment that contains apples, 
                    and are rewarded when they gather these apples. However, if all of the apples are harvested before they can regrow, 
                    the environment will be barren for the rest of the episode and the agents will be unable to increase their return. 
                    You are charged with creating tax rates that will affect the reward 
                    signal for an agent harvesting an apple. Your goal is to maximize the mean return of the agents. """,
            "provide_history": """
                    The following history of your previous attempts tells you the attempt number, the three-tiered tax rate you 
                    previously generated for that attempt,  and the approximate mean income that the tax rate resulted in.
                    """,
            "reminder": """
                    Now you will generate the next tax rate to be implemented in the environment. Please provide this tax rate in the form $ [tax1, tax2, tax3] $ 
                    where each tax rate must be a floating point value between 0 and 1 inclusive. Do not forget the brackets, the commas, 
                    and the dollar signs. Before you generate the tax rate, please explain your rationale behind choosing it
                    (so the tax rate should be the last thing that you generate).
                    """,
            "last_try": f"",
        },     
        "no_return_goal_no_total_attempts_no_meaning": {
            "general_explanation": f"""
                    You are overseeing 7 agents in a MARL environment. These agents populate an environment that contains apples, 
                    and are rewarded when they gather these apples. However, if all of the apples are harvested before they can regrow, 
                    the environment will be barren for the rest of the episode and the agents will be unable to increase their return. 
                    Your goal is to maximize the mean return of the agents. You are able to draft tax rates that will affect the reward 
                    signal for an agent from harvesting an apple. Each 'taxed' reward will be redistributed evenly amongst all of the 
                    agents. In order to maximize the mean return of these agents, you need a tax rate that prevents overharvesting 
                    without discouraging harvesting altogether. """,
            "provide_history": """
                    The following history of your previous attempts tells you the attempt number, the three-tiered tax rate you 
                    previously generated for that attempt,  and the mean income that the tax rate resulted in.
                    """,
            "reminder": """
                    Now you will generate the next tax rate to be implemented in the environment, with the goal of maximizing the 
                    mean return of the agents. Please provide this tax rate in the form $ [tax1, tax2, tax3] $ 
                    where each tax rate must be a floating point value between 0 and 1 inclusive. Do not forget the brackets, the commas, 
                    and the dollar signs. Before you generate the tax rate, please explain your rationale behind choosing it
                    (so the tax rate should be the last thing that you generate).
                    """,
            "last_try": f"",
        },
        "hard": {
            "general_explanation": f"""
                    You are overseeing 7 agents in a MARL environment. These agents populate an environment that contains apples, 
                    and are rewarded when they gather these apples. You are able to draft tax rates that will affect the reward 
                    signal for an agent from harvesting an apple. Your goal is to find a tax rate that maximizes the mean return
                    of these agents over an episode. """,
            "provide_history": """
                    The following history of your previous attempts tells you the attempt number, the three-tiered tax rate you 
                    previously generated for that attempt,  and the mean income that the tax rate resulted in.
                    """,
            "reminder": """
                    Now you will generate the next tax rate to be implemented in the environment, with the goal of maximizing the 
                    mean return of the agents. Please provide this tax rate in the form $ [tax1, tax2, tax3] $ 
                    where each tax rate must be a floating point value between 0 and 1 inclusive. Do not forget the brackets, the commas, 
                    and the dollar signs. Before you generate the tax rate, please explain your rationale behind choosing it
                    (so the tax rate should be the last thing that you generate).
                    """,
            "last_try": f"",
        },
        "reveal_apples-og": {
            "general_explanation": f"""You are overseeing 7 agents in a MARL environment. These agents populate an environment
            with several patches of apples, and are rewarded when they gather these apples. However, if all apples in a patch are 
            harvested before there is time for them to regrow, the patch will be barren for the rest of the episode. Your goal is
            to maximize the mean return of the agents. You are able to draft tax rates that will affect the reward signal for an agent from harvesting an apple. Each 'taxed' reward will be redistributed 
            evenly amongst all of the agents. In order to maximize the mean return of these agents, you need a tax rate that does
            not disincentivize apple harvesting, but also does not allow for the agents to harvest all of the apples in a patch too 
            quickly. You only have {num_episodes} total attempts to produce the optimal tax rate.  """,
            "provide_history": """A history of your previous attempts tells you the attempt number, the three-tiered tax rates you previously generated for that attempt, 
                and the mean income that the tax rate resulted in. Look at your previous attempts, and draft a new tax rate to try to increase the
                mean income as much as possible before you run out of attempts. Here is your history of attempts: """,
            "reminder": """Now you will generate the next tax rate to be implemented in the environment, with the goal of
              maximizing the mean return of the agents. You should aim for a mean return of 55. Please provide this tax rate in the form $ [tax1, tax2, tax3] $ where 
              each tax rate must be a floating point value between 0 and 1 inclusive. Do not forget the brackets, the commas, and the dollar signs. Before
              you generate the tax rate, please explain your rationale behind choosing it.""",
            "last_try": f"""This is your final attempt. Please reflect on your history of attempts, and generate the input that you 
                believe will achieve the highest return.""",
        },
        "cleanup_base": {
            #worry: the word incentive has a positive connotation
            "general_explanation": f"""
                    You are overseeing 7 agents in a MARL environment. These agents populate an environment that contains apples, 
                    and receive a reward of 1 when they gather these apples. There is also a river in the environment that builds pollution
                    at a steady rate. The apples regrow at a rate that is inversely proportional to the pollution in the river, so that if the river
                    is dirty, apples won't regrow and agents will be unable to harvest. Agents can clean the river, but get 0 reward for doing so. 
                    You will produce 3 incentives to be implemented in an episode of this environment, such that agents that harvest an apple will 
                    receive an additional reward of incentive1, agents that clean the river will receive a reward of incentive2,
                    and agents that do some other action will receive a reward of incentive3. You want to maximize the mean number of apples harvested.
                    """,
            "provide_history": """
                    The following history of your previous attempts tells you the attempt number, the incentives you 
                    previously generated for that attempt, and the mean income that the incentives resulted in.
                    """,
                    #Here, the LLM will be given the history of their previous attempts in the form of 
                    # 'Generation x: You generated the set of incentives [incentive1, incentive2, incentive3], which gave you a reward of y'
                    # 'Generation x+1: You generated the set of incentives [incentive1, incentive2, incentive3], which gave you a reward of z'
            "reminder": """
                    Now you will generate the next set of incentives to be implemented in the environment. Please provide this set of incentives in the form $ [incentive1, incentive2, incentive3] $ 
                    where each incentive must be a floating point value that is nonnegative and not greater than 2. Do not forget the brackets, the commas, 
                    and the dollar signs. Before you generate the incentives, please explain your rationale behind choosing it
                    (so the incentives should be the last thing that you generate).
                    """,
            "last_try": f"",
        },
        "cleanup_historical": {
            #worry: the word incentive has a positive connotation
            "general_explanation": f"""
                    You are overseeing 7 agents in a MARL environment. These agents populate an environment that contains apples, 
                    and receive a reward of 1 when they gather these apples. There is also a river in the environment that builds pollution
                    at a steady rate.
                    You will produce 3 incentives to be implemented in an episode of this environment, such that agents that harvest an apple will 
                    receive an additional reward of incentive1, agents that clean the river will receive a reward of incentive2,
                    and agents that do some other action will receive a reward of incentive3. 
                    You want to maximize harvesting.
                    """,
            "provide_history": """
                    The following history of your previous attempts tells you the attempt number, the incentives you 
                    previously generated for that attempt, and the mean income that the incentives resulted in.
                    """,
                    #Here, the LLM will be given the history of their previous attempts in the form of 
                    # 'Generation x: You generated the set of incentives [incentive1, incentive2, incentive3], which gave you a reward of y'
                    # 'Generation x+1: You generated the set of incentives [incentive1, incentive2, incentive3], which gave you a reward of z'
            "reminder": """
                    Now you will generate the next set of incentives to be implemented in the environment. Please provide this set of incentives in the form $ [incentive1, incentive2, incentive3] $ 
                    where each incentive must be a floating point value that is nonnegative and not greater than 3. Do not forget the brackets, the commas, 
                    and the dollar signs. Before you generate the incentives, please explain your rationale behind choosing it
                    (so the incentives should be the last thing that you generate).
                    """,
            "last_try": f"",
        },
        "cleanup_historical2": {
            #worry: the word incentive has a positive connotation
            "general_explanation": f"""
                    You are overseeing 7 agents in a MARL environment. These agents populate an environment that contains apples, 
                    and receive a positive reward when they gather these apples. There is also a river in the environment that builds pollution
                    at a steady rate.
                    You will produce 3 incentives to be implemented in an episode of this environment, such that agents that harvest an apple will 
                    receive an additional reward of incentive1, agents that clean the river will receive a reward of incentive2,
                    and agents that do some other action will receive a reward of incentive3. 
                    Your goal is to maximize agent harvesting.
                    """,
            "provide_history": """
                    The following history of your previous attempts tells you the attempt number, the incentives you 
                    previously generated for that attempt, and the level of agent harvesting that the incentives resulted in.
                    """,
                    #Here, the LLM will be given the history of their previous attempts in the form of 
                    # 'Generation x: You generated the set of incentives [incentive1, incentive2, incentive3], which gave you a reward of y'
                    # 'Generation x+1: You generated the set of incentives [incentive1, incentive2, incentive3], which gave you a reward of z'
            "reminder": """
                    Now you will generate the next set of incentives to be implemented in the environment. Please provide this set of incentives in the form $ [incentive1, incentive2, incentive3] $ 
                    where each incentive must be a floating point value that is nonnegative and not greater than 3. Do not forget the brackets, the commas, 
                    and the dollar signs. Before you generate the incentives, please explain your rationale behind choosing it
                    (so the incentives should be the last thing that you generate).
                    """,
            "last_try": f"",
        },
        "all_pollution_in_aux": {
            #worry: the word incentive has a positive connotation
            "general_explanation": f"""
                    You are overseeing 7 agents in a MARL environment. These agents populate an environment that contains apples and a river. 
                    You will produce 3 incentives to be implemented in an episode of this environment, such that agents that harvest an apple will 
                    receive an additional reward of incentive1, agents that clean the river will receive a reward of incentive2,
                    and agents that do some other action will receive a reward of incentive3. 
                    Your goal is to maximize apple harvesting. 
                    """,
            "provide_history": """
                    The following history of your previous attempts tells you the attempt number, the incentives you 
                    previously generated for that attempt, and the level of agent harvesting that the incentives resulted in.
                    """,
                    #Here, the LLM will be given the history of their previous attempts in the form of 
                    # 'Generation x: You generated the set of incentives [incentive1, incentive2, incentive3], which gave you a reward of y'
                    # 'Generation x+1: You generated the set of incentives [incentive1, incentive2, incentive3], which gave you a reward of z'
            "reminder": """
                    Now you will generate the next set of incentives to be implemented in the environment. Please provide this set of incentives in the form $ [incentive1, incentive2, incentive3] $ 
                    where each incentive must be a floating point value that is nonnegative and not greater than 3. Do not forget the brackets, the commas, 
                    and the dollar signs. Before you generate the incentives, please explain your rationale behind choosing it
                    (so the incentives should be the last thing that you generate).
                    """,
            "last_try": f"",
        },
        "cleanup_modifier": {
            #worry: clarity about what a modifier is
            "general_explanation": f"""
                    You are overseeing 7 agents in a MARL environment. These agents populate an environment that contains apples, 
                    and receive a reward of 1 when they gather these apples. There is also a river in the environment that builds pollution
                    at a steady rate. As the river gets dirty, apples grow more slowly. Agents
                    can clean the river, but don't receive a reward for doing so. You will produce 3 modifiers to be implemented in an episode of this environment, 
                    such that agents that harvest an apple (base reward = 1) will receive an additional reward of modifier1, agents that clean the river (base reward = 0) will receive a reward of modifier2,
                    and agents that do some other action (base reward = 0) will receive a reward of modifier3. Your goal is to maximize the mean number of apples harvested.
                    """,
            "provide_history": """
                    The following history of your previous attempts tells you the attempt number, the incentives you 
                    previously generated for that attempt,  and the mean income that the incentives resulted in.
                    """,
                    #Here, the LLM will be given the history of their previous attempts in the form of 
                    # 'Generation x: You generated the set of modifiers [modifier1, modifier2, modifier3], which gave you a reward of y'
                    # 'Generation x+1: You generated the set of modifiers [modifier1, modifier2, modifier3], which gave you a reward of z'
            "reminder": """
                    Now you will generate the next set of modifiers to be implemented in the environment. Please provide this set of modifiers in the form $ [modifier1, modifier2, modifier3] $ 
                    where each modifier must be a floating point value that is nonnegative and not greater than 2. Do not forget the brackets, the commas, 
                    and the dollar signs. Before you generate the modifiers, please explain your rationale behind choosing it
                    (so the modifiers should be the last thing that you generate).
                    """,
            "last_try": f"",
        },
        "cleanup_clarify_modifier": {
            #worry: clarity about what a modifier is
            "general_explanation": f"""
                    You are overseeing 7 agents in a MARL environment. These agents populate an environment that contains apples, 
                    and receive a reward of 1 when they gather these apples. There is also a river in the environment that builds pollution
                    at a steady rate. The apples regrow at a rate that is inversely proportional to the pollution in the river -- if the river is very polluted, apples will barely grow.
                    Agents can clean the river, but don't receive a reward for doing so. You will produce 3 modifiers to be implemented in an episode of this environment.
                    These modifiers will be applied to the reward signal for an agent based on the action that it takes,
                    such that agents that harvest an apple will see a reward signal of (1 + modifier1), agents that clean the river will see a reward signal of (0 + modifier2),
                    and agents that do some other action will see a reward signal of (0 + modifier3). Your goal is to mmazimize the mean number of apples harvested.
                    """,
            "provide_history": """
                    The following history of your previous attempts tells you the attempt number, the three modifiers you 
                    previously generated for that attempt,  and the mean income that the modifiers resulted in.
                    """,
                    #Here, the LLM will be given the history of their previous attempts in the form of 
                    # 'Generation x: You generated the set of modifiers [modifier1, modifier2, modifier3], which gave you a reward of y'
                    # 'Generation x+1: You generated the set of modifiers [modifier1, modifier2, modifier3], which gave you a reward of z'
            "reminder": """
                    Now you will generate the next set of modifiers to be implemented in the environment. Please provide this set of modifiers in the form $ [modifier1, modifier2, modifier3] $ 
                    where each modifier must be a floating point value that is nonnegative and not greater than 2. Do not forget the brackets, the commas, 
                    and the dollar signs. Before you generate the modifiers, please explain your rationale behind choosing it
                    (so the modifiers should be the last thing that you generate).
                    """,
            "last_try": f"",
        },
        "cleanup_vx":{
            'general_explanation': f'''
                You are overseeing agents in a
                MARL environment. The agents populate an environment with apples and a river.
                The agents are rewarded +1 when they harvest an apple, but pollution builds up in the river
                and causes the apples to regrow much more slowly. The agents have the ability to clean the
                river, but aren't rewarded for doing so. You will draft a set of three numbers that will modify
                the reward signal for the agents. The first number will be added to the +1 reward that agents get to harvest,
                the second number will be added to the 0 reward that agents get to clean the river, and the third number will
                be added to the 0 reward that agents get for any other action. Your goal is to maximize the total number of apples
                that the agents are able to eat in an episode.The following history of your prev ious attempts tells you the attempt number,
                the three modifiers that you came up with, and the total number of apples that the agents were able to eat in that episode.
                ''',
                "provide_history": """
                    The following history of your previous attempts tells you the attempt number, the three modifiers you 
                    previously generated for that attempt,  and the mean income that the modifiers resulted in.
                    """,
                    #Here, the LLM will be given the history of their previous attempts in the form of 
                    # 'Generation x: You generated the set of modifiers [modifier1, modifier2, modifier3], which gave you a reward of y'
                    # 'Generation x+1: You generated the set of modifiers [modifier1, modifier2, modifier3], which gave you a reward of z'
                "reminder": """
                        Now you will generate the next set of modifiers to be implemented in the environment. Please provide this set of modifiers in the form $ [modifier1, modifier2, modifier3] $ 
                        where each modifier must be a floating point value that is nonnegative and not greater than 2. Do not forget the brackets, the commas, 
                        and the dollar signs. Before you generate the modifiers, please explain your rationale behind choosing it
                        (so the modifiers should be the last thing that you generate).
                        """,
                "last_try": f"",
        },
         "cleanup_unrelated":{
            'general_explanation': f'''
                You are overseeing agents in a
                MARL environment. The agents populate an environment with apples and a river.
                The agents are rewarded +1 when they harvest an apple, but pollution builds up in the river
                and causes the apples to regrow much more slowly. The agents have the ability to clean the
                river, but aren't rewarded for doing so. You will draft a set of three numbers that will modify
                the reward signal for the agents. The first number will be added to the +1 reward that agents get to harvest,
                the second number will be added to the 0 reward that agents get to clean the river, and the third number will
                be added to the 0 reward that agents get for actions other than harvesting or cleaning. Your goal is to maximize the total number of apples
                that the agents are able to eat in an episode.The following history of your prev ious attempts tells you the attempt number,
                the three modifiers that you came up with, and the total number of apples that the agents were able to eat in that episode.
                ''',
                "provide_history": """
                    The following history of your previous attempts tells you the attempt number, the three modifiers you 
                    previously generated for that attempt,  and the mean income that the modifiers resulted in.
                    """,
                    #Here, the LLM will be given the history of their previous attempts in the form of 
                    # 'Generation x: You generated the set of modifiers [modifier1, modifier2, modifier3], which gave you a reward of y'
                    # 'Generation x+1: You generated the set of modifiers [modifier1, modifier2, modifier3], which gave you a reward of z'
                "reminder": """
                        Now you will generate the next set of modifiers to be implemented in the environment. Please provide this set of modifiers in the form $ [modifier1, modifier2, modifier3] $ 
                        where each modifier must be a floating point value that is nonnegative and not greater than 2. Do not forget the brackets, the commas, 
                        and the dollar signs. Before you generate the modifiers, please explain your rationale behind choosing it
                        (so the modifiers should be the last thing that you generate).
                        """,
                "last_try": f"",
        },
        "cleanup_no_cost":{
            'general_explanation': f''' You are overseeing agents in a
            MARL environment. The agents populate an environment with apples and a river.
            The agents are rewarded +1 when they harvest an apple, but pollution builds up in the river
            and causes the apples to regrow much more slowly. The agents have the ability to clean the
            river, but aren't rewarded for doing so. You will draft a set of three numbers that will modify
            the reward signal for the agents. The first number will be added to the +1 reward that agents get to harvest,
            the second number will be added to the 0 reward that agents get to clean the river, and the third number will
            be added to the 0 reward that agents get for some other action. Your goal is to maximize the total number of apples
            that the agents are able to eat in an episode.''',

            "provide_history": '''The following history of your previous attempts tells you the attempt number, the three modifiers that you
            came up with, and the total number of apples that the agents were able to eat in that episode.''',

            "reminder": """Please generate the next set of modifiers that you think will enable the agents to 
            harvest the most apples through the episode. Provide these modifiers in the form $ [modifier1, modifier2, modifier3] $ where each modifier
            must be a floating point value that is nonnegative and not greater than 2, surrounded by brackets and separated by commas. The list should be wrapped in dollar signs.
            Before you generate the modifiers, please explain your rationale behind choosing them.""",
        },
        "harvest-historical_meaning": {
            "general_explanation": f""" You are overseeing agents in a MARL environment. 
            These agents populate an environment that contains apples they can collect. 
            However, if all of the apples are harvested before they can regrow, the environment will be barren for the rest of the episode. 
            Without any tax rates, the agents overharvest, leaving none to regrow and getting a low mean return. 
            You are charged with creating tax rates that will affect the reward signal for an agent harvesting an apple. 
            Your goal is to maximize the mean return of the agents.""",
            "provide_history": """Below, you are given the history of your previous attempts. """,
            "reminder": """Now you will generate the next tax rate to be implemented in the environment, with the goal of maximizing the mean return of the agents. 
            Please provide this tax rate in the form $ [tax1, tax2, tax3] $ where each tax rate must be a floating point value between 0 and 1 inclusive. 
            Do not forget the brackets, the commas, and the dollar signs. 
            Before you generate the tax rate, please explain your rationale behind choosing it (so the tax rate should be the last thing that you generate). """,
        },
        "harvest-historical": {
            "general_explanation": f""" You are overseeing agents in a MARL environment. 
            These agents populate an environment that contains apples they can collect. 
            However, if all of the apples are harvested before they can regrow, the environment will be barren for the rest of the episode. 
            Without any tax rates, the agents overharvest, leaving none to regrow and getting a low mean return. 
            You are charged with creating tax rates that will affect the reward signal for an agent harvesting an apple. 
            Your goal is to maximize the mean return of the agents.""",
            "provide_history": """Below, you are given the history of your previous attempts. """,
            "reminder": """Now you will generate the next tax rate to be implemented in the environment, with the goal of maximizing the mean return of the agents. 
            Please provide this tax rate in the form $ [tax1, tax2, tax3] $ where each tax rate must be a floating point value between 0 and 1 inclusive. 
            Do not forget the brackets, the commas, and the dollar signs. 
            Before you generate the tax rate, please explain your rationale behind choosing it (so the tax rate should be the last thing that you generate). """,
        },
        "harvest-historical": {
            "general_explanation": f""" You are overseeing agents in a MARL environment. 
            These agents populate an environment that contains apples they can collect. 
            However, if all of the apples are harvested before they can regrow, the environment will be barren for the rest of the episode. 
            Without any tax rates, the agents overharvest, leaving none to regrow and getting a low mean return. 
            You are charged with creating tax rates that will affect the reward signal for an agent harvesting an apple. 
            Your goal is to maximize the mean return of the agents.""",
            "provide_history": """Below, you are given the history of your previous attempts. """,
            "reminder": """Now you will generate the next tax rate to be implemented in the environment, with the goal of maximizing the mean return of the agents. 
            Please provide this tax rate in the form $ [tax1, tax2, tax3] $ where each tax rate must be a floating point value between 0 and 1 inclusive. 
            Do not forget the brackets, the commas, and the dollar signs. 
            Before you generate the tax rate, please explain your rationale behind choosing it (so the tax rate should be the last thing that you generate). """,
        },
    }

    return prompts[style]
