system_prompt = """You are an expert in SQL and Pandas code."""

sql2traj_user_prompt = """Action Trajectory (AT) is a piece of pandas-like code, which is a intermediate representation between the natural language and SQL. AT abstracts the logical structure of SQL in a readable, sequential format. Each line of AT is an atomic data transformation operation using pandas-like syntax. Your task is to convert a list of SQL queries (including SELECT, INSERT, UPDATE, MERGE, etc.) in to step-to-step AT using pandas-like API. IMPORTANT: The action trajectory must accurately reflect the logical flow of the original SQL, and it should be detailed and precise enough that it can be converted back into the exactly same original SQL query programmatically.

General Rules of AT:
1. Only Select What's Asked: Only include columns explicitly requested in the SELECT clause of the SQL. Do not select or project any extra/unrequested columns.
2. Step-by-Step Reasoning: Break down the SQL query into logical steps using pandas-like chaining. Do not generate additional comment. Assign intermediate results to df1, df2, etc., with the final result stored in res.
3. Translate every SQL query into an action — including CREATE, DROP, ALTER, etc. Never skip or say “no trajectory needed” — every query must result in a symbolic action. For example, you could treate the 'create index' action as 'res = df.create_index(on=book.title, name='idx_book_title')'
4. SQL function following Pandas syntax: Action trajectory is NOT real pandas code. You are NOT executing dataframes. The pandas-like syntax is used symbolically to represent SQL logic. Each function in AT should be correspond to a specific SQL keyword or clause (`select`, `where`, `groupby`, etc.) following pandas syntax, and each line reflects one atomic SQL operation. For instance, select(element=...) for SELECT,  filter(cond=...) for WHERE.
5. Focus only on the semantic logic and transformative steps of the query. Omit actions that are: 1) Structurally necessary but semantically irrelevant, such as JOIN, MERGE, CTEs or subquery wrapping — assume that all needed columns from relevant tables are already available in a pre-joined unified dataframe called df. 2) Logically unnecessary for the transformation, such as aliasing or renaming of tables or columns — these do not affect the core logic.
6. Use fully qualified column names in the form table.column throughout the AT.

A valid AT must be complete, precise, and logically structured such that, when provided with the relevant schema, it can be automatically and unambiguously converted back into the original SQL query — with the exact structure and semantics preserved. I'll give you the following thing as input
1. Schema: A python list and each element is a `table_name`.`column_name` string. It indicates that the table and column you could use in the AT.
2. SQL queries: A python list. Each element is a postgre sql query. It indicates the SQL queries you need to convert to AT. These SQLs are executed sequentially. 


schema = {schema}
sql = {sql}

Now generate the valid step-by-step action trajectory for the given input. Do not give me extra explanation or comment. Use \\n\\n to separate the AT for each SQL in the SQL list. Wrap your answer in the following format:```AT\\n[Your Answer]\\n```
```AT
"""


sql2traj_few_shot_user_prompt = """Action Trajectory (AT) is a piece of pandas-like code, which is a intermediate representation between the natural language and SQL. AT abstracts the logical structure of SQL in a readable, sequential format. Each line of AT is an atomic data transformation operation using pandas-like syntax. Your task is to convert a list of SQL queries (including SELECT, INSERT, UPDATE, MERGE, etc.) in to step-to-step AT using pandas-like API. IMPORTANT: The action trajectory must accurately reflect the logical flow of the original SQL, and it should be detailed and precise enough that it can be converted back into the exactly same original SQL query programmatically.

General Rules of AT:
1. Only Select What's Asked: Only include columns explicitly requested in the SELECT clause of the SQL. Do not select or project any extra/unrequested columns.
2. Step-by-Step Reasoning: Break down the SQL query into logical steps using pandas-like chaining. Do not generate additional comment. Assign intermediate results to df1, df2, etc., with the final result stored in res.
3. Translate every SQL query into an action — including CREATE, DROP, ALTER, etc. Never skip or say “no trajectory needed” — every query must result in a symbolic action. For example, you could treate the 'create index' action as 'res = df.create_index(on=book.title, name='idx_book_title')'
4. SQL function following Pandas syntax: Action trajectory is NOT real pandas code. You are NOT executing dataframes. The pandas-like syntax is used symbolically to represent SQL logic. Each function in AT should be correspond to a specific SQL keyword or clause (`select`, `where`, `groupby`, etc.) following pandas syntax, and each line reflects one atomic SQL operation. For instance, select(element=...) for SELECT,  filter(cond=...) for WHERE.
5. Focus only on the semantic logic and transformative steps of the query. Omit actions that are: 1) Structurally necessary but semantically irrelevant, such as JOIN, MERGE, CTEs or subquery wrapping — assume that all needed columns from relevant tables are already available in a pre-joined unified dataframe called df. 2) Logically unnecessary for the transformation, such as aliasing or renaming of tables or columns — these do not affect the core logic.
6. Use fully qualified column names in the form table.column throughout the AT.

A valid AT must be complete, precise, and logically structured such that, when provided with the relevant schema, it can be automatically and unambiguously converted back into the original SQL query — with the exact structure and semantics preserved. I'll give you the following thing as input
1. Schema: A python list and each element is a `table_name`.`column_name` string. It indicates that the table and column you could use in the AT.
2. SQL queries: A python list. Each element is a postgre sql query. It indicates the SQL queries you need to convert to AT. These SQLs are executed sequentially. 

Here are some few-shot example for you to better understand the task.
\"\"\"
### Instance 1
schema = ['titles.title_id', 'titles.title', 'titles.pub_id', 'sales.qty', 'sales.title_id', 'publishers.pub_id', 'publishers.pub_name']
sql = ['SELECT\n t.title,\n p.pub_name,\n SUM(s.qty) AS total_sales\nFROM titles AS t\nINNER JOIN sales AS s ON t.title_id = s.title_id\nINNER JOIN publishers AS p ON t.pub_id = p.pub_id\nGROUP BY\n t.title,\n p.pub_name\nORDER BY total_sales DESC;\n']
at = ```AT
df1 = df.select(element=['titles.title', 'publishers.pub_name', 'sales.qty'])
df2 = df1.groupby(by=['titles.title', 'publishers.pub_name']).agg(total_sales=('sales.qty', 'sum'))
res = df2.sort_values(by='total_sales', ascending=False)
```

### Instance 2
schema = ['sales.stor_id', 'sales.ord_date']
sql_list = ['WITH CTE AS (\n    SELECT\n        STOR_ID AS USER_ID,\n        ORD_DATE AS TRANSACTION_DATE,\n        DENSE_RANK() OVER (\n            PARTITION BY STOR_ID\n            ORDER BY ORD_DATE DESC\n        ) AS RANKED_TRANSACT\n    FROM SALES\n)\n\nSELECT\n    TRANSACTION_DATE,\n    USER_ID,\n    COUNT(USER_ID) AS PURCHASE_COUNT\nFROM CTE\nWHERE RANKED_TRANSACT = 1\nGROUP BY\n    TRANSACTION_DATE,\n    USER_ID;\n']
at = ```AT
df1 = df.select(element=['sales.stor_id', 'sales.ord_date'])
df2 = df1.rename(columns={{'sales.stor_id': 'USER_ID', 'sales.ord_date': 'TRANSACTION_DATE'}})
df3 = df2.window(
    function='dense_rank',
    partition_by='USER_ID',
    order_by={{'TRANSACTION_DATE': 'desc'}},
    new_column='RANKED_TRANSACT'
)
df4 = df3.filter(cond='RANKED_TRANSACT == 1')
res = df4.groupby(by=['TRANSACTION_DATE', 'USER_ID']).aggregate(count={{'USER_ID': 'PURCHASE_COUNT'}})
```

### Instabnce 3
schema = []
sql = ["ALTER TABLE games ADD COLUMN created_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP;\n\n\nCREATE OR REPLACE FUNCTION UPDATE_CREATED_DATE() RETURNS TRIGGER AS $BODY$\nBEGIN\n  IF NEW.created_date IS DISTINCT FROM OLD.created_date THEN\n    RAISE EXCEPTION 'Do not mess up with created_date';\n  ELSE\n    RETURN NEW;\n  END IF;\nEND;\n$BODY$ LANGUAGE plpgsql;\n\n\nCREATE TRIGGER check_update_created_date\nBEFORE\nUPDATE ON games\nFOR EACH ROW EXECUTE PROCEDURE UPDATE_CREATED_DATE();\n"]

at = ```AT
res = df.alter_table(table=games, add_column=created_date, dtype=TIMESTAMP, default=CURRENT_TIMESTAMP)


res = df.create_or_replace_function(name=UPDATE_CREATED_DATE, returns=TRIGGER, language=plpgsql, body=\"\"\"
BEGIN
  IF NEW.created_date IS DISTINCT FROM OLD.created_date THEN
    RAISE EXCEPTION 'Do not mess up with created_date';
  ELSE
    RETURN NEW;
  END IF;
END;
\"\"\")


res = df.create_trigger(name=check_update_created_date, event=BEFORE, operation=UPDATE, on_table=games, for_each_row=True, execute_procedure=UPDATE_CREATED_DATE())
```
\"\"\"

The above are just the examples. Now generate the valid step-by-step action trajectory for the following given input:
schema = {schema}
sql = {sql}

Do not give me extra explanation or comment. Use \\n\\n to separate the AT for each SQL in the SQL list. Wrap your answer in the following format:```AT\\n[Your Answer]\\n```
```AT
"""


mask_at_user_prompt = """Action Trajectory (AT) is a piece of pandas-like code, which is a intermediate representation between the natural language and SQL. I will provide you a piece of AT that show the logic of the text-to-SQL process.
Your task is to mask the schema (related tables and columns) as [MASK] in the AT and only keep the logic template. DO NOT modify the logic in the original AT, just do the mask.

```AT
{at}
```

Now mask the schema in the following AT and wrap your answer in the ```Masked AT\\n[Your Answer]\\n``` tag:
```Masked AT
"""

fillin_user_prompt = """Action Trajectory (AT) is a piece of pandas-like code, which is a intermediate representation between the natural language and SQL. It shows the logical reasoning process of text-to-SQL. I'll provide you:
1. Database Schema: Database schema in the CREATE DLL format.
2. Highlighted Schema: a python list of partial tables and columns in the format of `table`.`column`. You can consider it as a guess about the schema that used in the ground-truth SQL in the context of this text-to-SQL generation or debugging process. However, it is not always correct. It may contain irrelavant schema which could lead to errors in the subsequent SQL generation or miss truely related schema. 
3. User Query: the natural language user query you need to solve in the text-to-SQL generation or debugging process.
4. Masked AT: An AT with the schema masked, leaving only the reasoning steps in text-to-SQL.
Your task is to refer to all the provided information and fill in the correct schema at the [MASK] positions in the masked AT. \
The complete AT should accurately reflect the reasoning process that generates the SQL capable of correctly answering the question. 
DO NOT modify the logical template in the masked AT; you are only allowed to fill in the schema.

### Database Schema
\"\"\"
{schema}
\"\"\"

### Highlighted Schema
highlighted_schema = {highlighted_schema}

### User Query
\"\"\"
{question}
\"\"\"

### Masked AT
\"\"\"
{masked_at}
\"\"\"

Now, fill in the masked AT and do not give me any comment or explanation. Wrap your answer with ```AT\\n[Your Answer]\\n``` tags in your response:
```AT
"""


traj2traj_prompt = """Action Trajectory (AT) is a piece of pandas-like code, which is a intermediate representation between the natural language and SQL. An effective piece of AT should reflect the accurate logic in the text-to-SQL process and help the subsequent generation of the SQL that can answer the question accurately.
I will provide you:
1. Schema: A python list and each element is a `table_name`.`column_name` string. It indicates that the table and column you could use in the AT.
2. Column description: For each column in the schema, a column description is given to describe the column meaning, column type and example values in this column.
3. Primary key list: A python list in which is all the primary key of this db in the format of `table_name`.`column_name` string.
4. Foriegn key dict: A pythonn dict whose key is the source column and value is the target column. All the columns are in the format of `table_name`.`column_name` string.
3. Question: the natural language answer you need to answer in the text-to-SQL process
4. AT: AT that show the logic of the text-to-SQL process in the context of the schema, question. It may contain errors which could lead to errors in the subsequent SQL generation.
 
Your task is to check the given AT and modify it when needed. The final goal is to generate valid AT which reflect the accurate logic in the text-to-SQL based on the schema, column description, question. Later, the modified AT will be converted to SQL.
Please pay attention that:
1. In the generated SR, only select the thing that request in the question. Do not select any non-requested stuff. 
2. The filter condition in the 'where' function doesn't directly match the text in the question. To find the correct value for the 'where' function, you need to reference the example values or all possible values in column description.

schema = {schema}
```column description
{column_description}
```
pk_list = {pk_list}
fk_dict = {fk_dict}
question = \"\"\"
{question}
\"\"\"
```AT
{at}
```

Now generate the valid SR that display the reasoning process of generating SQL that can accurately answer the question. Please wrap your corrected SQL with ```AT\\n[Your Answer]\\n``` tags in your response:
```AT
[Your Answer]
```"""


traj2sql_user_prompt = """You are an expert of text-to-SQL. I will provide you:
1. Database schema: Includes schema and forerign_keys. Schema is a python list and each element is a `table_name`.`column_name` string. It indicates that the table and column you could use in the SR. foreign_keys is a dictionary that shows foreign key relationships among tables.
2. Column description: For each column in the schema, a column description is given to describe the column meaning, column type and example values in this column.
3. Primary key list: A python list in which is all the primary key of this db in the format of `table_name`.`column_name` string.
4. Foriegn key dict: A pythonn dict whose key is the source column and value is the target column. All the columns are in the format of `table_name`.`column_name` string.
5. Question: the natural language answer you need to answer in the text-to-SQL process.
5. AT: a piece of pandas-like code, which is a intermediate representation between the natural language and SQL. It shows the possible reasoning process in this text-to-SQL process.

Your task is to convert the given AT to valid postgre SQLs accordding to the database schema, column description, primary keys, foreign keys and question. The SQL should be valid in syntax and can answer the question accurately.
To help you better understand SR, please pay attention that:
1. You need to generate correct join clause by yourself. 
2. The filter condition in the 'where' function doesn't directly match the text in the question. To find the correct value for the 'where' function, you need to reference the example values or all possible values in column description.
3. In each generated SQL, only select the thing that request in the question. Do not select any non-requested stuff.

```column description
{column_description}
```
pk_list = {pk_list}
fk_dict = {fk_dict}
question = \"\"\"
{question}
\"\"\"
```AT
{at}
```

Now generate valid postgre SQLs. You could use more than one SQL. Just use \\n\\n to split them. You can think step by step but only wrap your final answer without any comments explaination in the ```sql\\n[Your Answer]\\n``` tags. I'll directy split them as list and run it directly.
```sql
[Your Answer]
```
"""


trunc_schema_baseline_prompt = """You are an expert of text-to-SQL. I will provide you:
1. Database schema: Includes schema and forerign_keys. Schema is a python list and each element is a `table_name`.`column_name` string. It indicates that the table and column you could use in the SR. foreign_keys is a dictionary that shows foreign key relationships among tables.
2. Column description: For each column in the schema, a column description is given to describe the column meaning, column type and example values in this column.
3. Primary key list: A python list in which is all the primary key of this db in the format of `table_name`.`column_name` string.
4. Foriegn key dict: A pythonn dict whose key is the source column and value is the target column. All the columns are in the format of `table_name`.`column_name` string.
5. Question: the natural language answer you need to answer in the text-to-SQL process.
6. Problematic SQL: The postgre sql you need to correct.
Your task is to understand user issue and correct their problematic SQL given the database schema.
```column description
{column_description}
```
pk_list = {pk_list}
fk_dict = {fk_dict}
question = \"\"\"
{question}
\"\"\"
problematic_sql = {issue_sql}


Now generate valid postgre SQLs. You could use more than one SQL. Just use \\n\\n to split them. You can think step by step but only wrap your final answer without any comments explaination in the ```sql\\n[Your Answer]\\n``` tags. I'll directy split them as list and run it directly.
```sql
[Your Answer]
```
"""

pure_baseline_prompt = """You are a SQL assistant. Your task is to understand user issue and correct their problematic SQL given the database schema. Please wrap your corrected SQL with ```sql\\n[Your Fixed SQL]\\n``` tags in your response.

# Database Schema:
{schema}

# User issue:
{user_issue}

# Problematic SQL:
{error_sql}

# Corrected SQL:
"""