{
  "qa_opt": "You are a helpful assistant tasked with generating analytical questions for a dataset containing multiple tables. Each table described below is a sample from the original. The dataset includes the following tables with other metadata.\n\n{datasetInfo}\n\nAssume that necessary libraries has been imported and all tables have been loaded into pandas DataFrames with the names specified above - {dfStmt}. Here is an example:\n\nExample Question: What is the sum of the rows in each table?\n\nExample Answer:\n\n```python\nsum_rows = {codeStr}\nprint('The sum of the rows is', sum_rows)\n```\n\nBased on the dataset, generate several analytical questions that require data from multiple tables to formulate conclusions. For each question, provide Python code snippets to calculate the answers, ensuring to mention specific DataFrame names as demonstrated in the example.\n",
  "sum": "You are an assistant responsible for summarizing key information from a dataset composed of several tables. Each table provided below is a representative sample from the original dataset, which includes additional metadata.\n\n{datasetInfo}\n\nPlease write an overview of this dataset. Do not explain columns' meaning again.\n",
  "viz": "You are a helpful assistant tasked with generating visualization questions for a dataset containing multiple tables. Each table described below is a sample from the original. The dataset includes the following tables with other metadata.\n\n{datasetInfo}\n\nAssume that necessary libraries has been imported, and all tables have been loaded into pandas DataFrames with the names specified above - {dfStmt}. Here is an example:\n\nExample Question: Create a bar chart that shows the number of rows in each table.\n\nExample Answer:\n\n```python\n{codeStr}\n```\n\nBased on the dataset, generate several visualizaion questions that require data from multiple tables to formulate conclusions. For each question, provide Python code snippets to do the visualization, ensuring to mention specific DataFrame names as demonstrated in the example.\n",
  "qa": "You are a helpful assistant in generating questions for given dataset. The dataset contains several tables. Each table showed below is sampled from the original table. Please ask several questions related to data analysis and provide Python codes to solve them. Assuming that tables has been saved in DataFrame. The generated questions must be analysed across multiple tables to reach a conclusion. The following is the dataset information.\n\n{datasetInfo}\n\nAssume that {dfStmt}. Here is an example\n\nQuestion 1: What is the sum of the rows of every tables?\n\nAnswer:\n\n```python\nsum_rows = {codeStr}\nprint('The sum of the rows is', sum_rows)\n```\n\nPlease try to generate some questions and answers now.\n",
  "sum-test": "You are an assistant responsible for summarizing key information from a dataset composed of several tables. Each table provided below is a representative sample from the original dataset, which includes additional metadata.\n\n{datasetInfo}\n\nPlease write an overview of this dataset. Do not explain columns' meaning again.\n",
  "sum_numerical": "You are an assistant responsible for summarizing key information from a dataset composed of several tables. Each table provided below is a representative sample from the original dataset, which includes additional metadata.\n\n{datasetInfo}\n\nPlease write an overview of this dataset. Provide as much meaningful numerical analysis as possible. For example, Do not explain columns' meaning again.\n",
  "fv": "Please summarize a statement from a question-answering pair. Here is an example\n\nQuestion: What is the average population density (`popden`) for species in the 'Patescibacteria' `phylum`?\n\nAnswer: 551.54\n\nStatement: The average population density (`popden`) for species in the 'Patescibacteria' `phylum` is 551.54.\n\nQuestion: Calculate the percentage of congress representatives from the Democrat party. Among them, how many postal points are in the Hawaii state?\n\nAnswer: 62.07 | 0\n\nStatement: The percentage of congress representatives from the Democrat party is 62.07. There are no postal points in the Hawaii state.\n\nQuestion: What are the states with an above-average female population?\n\nAnswer: PA | DE | MD | NC | FL | AL | TN\n\nStatement: The states with an above-average female population are PA, DE, MD, NC, FL, AL and TN.\n\nNow please try it by yourself.\n\nQuestion: {question}\n\nAnswer: {answer}",
  "qa_augmentation": "The following is a dataset.\n\n{datasetInfo}\n\nGiven the following question and answer, please transform it into a multiple-choice question with one correct option (as provided in the answer) and three plausible distractors. Here is an example.\n\nInput:\nQuestion: How many tables are there in this dataset?\nAnswer: {cb}\n\nOutput:\nQuestion: What is the capital city of France?\nOptions:\nA) {ca}\nB) {cb}\nC) {cc}\nD) {cd}\n\nInput:\nQuestion: {question}\nAnswer: {answer}\n\nOutput:\nQuestion: {question}\nOptions:",
  "fv_augmentation": "The following is a dataset.\n\n{datasetInfo}\n\nGiven a true statement, please refer to it and write three additional false statements. Here is an example.\n\nTrue Statement: There are {cb} tables in this dataset.\nFalse Statement:\nA) There are {ca} tables in this dataset.\nB) There are {cc} tables in this dataset.\nC) There are {cd} tables in this dataset.\n\nTrue Statement: {statement}\nFalse Statement:",
  "qa_format": "The following is a dataset.\n\n{datasetInfo}\n\nNow, please select the right choice for the given question.\n\nQuestion: {question}\n\nChoices (right answer {r}):\n\nA. {a}\n\nB. {b}\n\nC. {c}\n\nD. {d}",
  "fv_format": "The following is a dataset.\n\n{datasetInfo}\n\nNow, please select the right choices for the given question. There may be more than one right choice.\n\nQuestion: Which statement is **True** in the following statements?\n\nChoices (right answer {r}):\n\nA. {a}\n\nB. {b}\n\nC. {c}\n\nD. {d}\n\nE. There are no right choices above.",
  "qa_workflow": "You are a helpful assistant tasked with generating analytical questions for a dataset containing multiple tables. The dataset includes the following tables with other metadata.\n\n{datasetInfo}\n\nAssume that necessary libraries have been imported and all tables have been loaded into pandas DataFrames with the names specified above - {dfStmt}. Here is an example:\n\nExample Question: What is the sum of the rows in each table?\n\nExample Answer:\n\n```python\nsum_rows = {codeStr}\nprint(sum_rows)\n```\n\nBased on the dataset,  generate several analytical questions that require data from multiple tables to formulate conclusions.\n\nFor each question, please fulfill the following requirements\n\n- Ensure every question is formatted like `### Question n: some_question` . Do not split it into multiple lines.\n\n- Each question should be singular and centered on a specific analytical task. Do not combine multiple queries in a single question. For example, \"How many tables are there in this dataset? Which table has the most rows?\" is not allowed.\n\n- Do not include procedural steps or intermediate tasks within the question. For example, \"Count each table's rows, then determine which table has the most rows\" is not allowed.\n\n- The expected answer should be a single, straightforward result, such as a number or a string. Avoid complex data structures like DataFrame or Series as the final output.\n\n- **The question should require the use of multiple tables to infer the answer.** Focus on scenarios where combining data from different tables through relationships (e.g., foreign keys) is necessary to conclude.\n\nFor each conclusion, please fulfill the following requirements\n\n- The conclusion should be provided as Python code, formatted as demonstrated in the examples above.\n\n- Wrap the code in \"```python```\" to ensure clarity and formatting.\n\n- The code should include **only one** print statement to display the final result.\n\n- Ensure the printed output is a simple data type (number, string) and avoid complex outputs like DataFrame or Series.\n\n**Please make sure your questions and conclusion fulfill the requirements above**.\n",
  "qa_workflow_simple": "You are a helpful assistant tasked with generating questions for a dataset containing multiple tables. The tables are as follows.\n\n{datasetInfo}\n\nAssume that necessary libraries have been imported and all tables have been loaded into pandas DataFrames with the names specified above - {dfStmt}. Here is an example:\n\nExample Question: How many rows of the table `{tabName}` ?\n\nExample Answer:\n\n```python\nrows_count = len({tabName})\nprint(row_count)\n```\n\nBased on the dataset, generate several questions with relevant conclusions.\n\nFor each question, please fulfill the following requirements\n\n- Ensure each question is singular and focused on a specific analytical task. For example, \"How many tables are there in this dataset? Which table has the most rows?\" is not allowed. Since it contains several questions.\n\n- Do not show the steps for the solution in the question. For example, \"Count each table rows, which table has the most rows?\" is not allowed.\n\n- The expected answer should be a simple item such as a single number or a string, not complex data structures like DataFrame or Series.\n\nFor each conclusion, please fulfill the following requirements\n\n- Each conclusion is a Python code as demonstrated in the example above.\n\n- The code should include **only one** print statement to display the final result.\n\n- Ensure the printed output is a simple data type (number, string) and avoid complex outputs like DataFrame or Series.\n\nThe question you provided can be simple. But **please make sure your questions and conclusion fulfill the requirements above**.\n",
  "fv_workflow": "Here is a question with several answer choices. Please rewrite each answer choice into a complete statement that directly answers the question, and is relevant to it. For each statement:\n\n1. Ensure it is grammatically correct and makes logical sense.\n2. Start each statement with the corresponding letter from the original answer choices: `A)`, `B)`, `C)`, or `D)`.\n\nBelow is an example:\n\n**Question:** How many tables are there in this dataset?\n\n**Answer Choices:**\n\nA) 1  \nB) 2  \nC) 3  \nD) 4\n\n**Statements:**\n\nA) There is only one table in this dataset.  \nB) There are two tables in this dataset.  \nC) There are three tables in this dataset.  \nD) There are four tables in this dataset.\n\nNow, using the following inputs, transform each answer choice into a similar format. \n\n**Question:** {question}\n\n**Answer Choices:**\n\n{answers}\n\nPlease follow the steps carefully, and provide the final output with each statement starting with `A)`, `B)`, `C)`, or `D)` under the heading \"**Statements:**\".",
  "qa_workflow_test": "You are a helpful assistant tasked with answering multiple-choice questions based on a dataset containing several tables. The dataset includes the following tables:\n\n{datasetInfo}\n\nQuestion: {question}\n\nChoices:\n\nA) {ca}\n\nB) {cb}\n\nC) {cc}\n\nD) {cd}\n\nThis is a single-choice question, meaning there is only one correct answer. Please analyze the question step by step, referencing the relevant tables as needed. Conclude your response by stating the proper answer on a separate line in the format `Answer: A/B/C/D` .",
  "fv_workflow_test": "You are an intelligent assistant tasked with answering a multiple-choice question based on a dataset containing several tables. The dataset includes the following tables:\n\n{datasetInfo}\n\nQuestion: Select the correct statement(s) from the following options.\n\nChoices:\n\nA) {ca}\n\nB) {cb}\n\nC) {cc}\n\nD) {cd}\n\nE) None of the above statements are correct.\n\nThere may be more than one correct answer. Please analyze each statement step by step, referencing the relevant tables as needed. Conclude your response by stating the correct answer(s) on a separate line using the format: `Answer: ABCDE` (without spaces between the letters if selecting multiple options).",
  "retrieval_workflow_test": "You are an intelligent assistant responsible for selecting the most suitable table(s) to answer a given question. Below, you will find four tables:\n\n{datasetInfo}\n\nQuestion: {question}\n\nTask: Analyze the following tables carefully and determine which one(s) provide the necessary information to answer the question.\n\nOptions:\n\nA) {ca}\n\nB) {cb}\n\nC) {cc}\n\nD) {cd}\n\nE) Additional tables are required to fully answer the question.\n\nMore than one option may be correct. Please provide a detailed step-by-step analysis, and then state the final answer(s) on a separate line using the format: `Answer: ABCDE` (with no spaces between the letters if selecting multiple options).",
  "pot_workflow_test": "You are an assistant skilled in writing Python code to answer questions related to a dataset containing multiple tables. The dataset includes the following tables along with additional metadata:\n\n{datasetInfo}\n\nYour task is **not** to answer the question directly but to provide Python code that will do so. Assume that all necessary libraries have already been imported and that the tables are loaded into pandas DataFrames with the specified names - {dfStmt}.\n\nPlease follow these guidelines when writing the code:\n\n- Use the \"```python```\" block to enclose your code for clarity and proper formatting.\n\n- Include **only one** print statement at the end to output the final result.\n\nExample Question: What is the sum of the rows in each table?\n\nExample Answer:\n\n```python\nsum_rows = {codeStr}\nprint(sum_rows)\n```\n\nNow, let\u2019s proceed!\n\nQuestion: {question}",
  "cellLookup_workflow_test": "You are an intelligent assistant tasked with finding elements in a dataset. Tables in the dataset are provided in CSV format and structured as follows:\n\n{datasetInfo}\n\nQuestion: Locate the element `{element}` in the dataset.\n\nChoices:\n\nA) {ca}\n\nB) {cb}\n\nC) {cc}\n\nD) {cd}\n\nThe row and column indices start at **0** . Please approach this task step by step, analyzing the dataset and eliminating incorrect choices. Finally, provide the correct answer in a separate line using the format: `Answer: A/B/C/D` .",
  "reverseLookup_workflow_test": "You are an intelligent assistant tasked with finding elements in a table. The table is provided in CSV format and structured as follows:\n\n{tableInfo}\n\nQuestion: Find the element located in row {row}, column {col}.\n\nChoices:\n\nA) {ca}\n\nB) {cb}\n\nC) {cc}\n\nD) {cd}\n\nThe row and column indices start at **0** . Please approach this task step by step, analyzing the dataset and eliminating incorrect choices. Finally, provide the correct answer in a separate line using the format: `Answer: A/B/C/D` .",
  "dataint_workflow": "You are an expert in data integration. I will provide you with a row from a database table in JSON format, where the keys represent column names and the values represent the data in that row. Your task is to create a matching entity pair by constructing a similar instance, but with slight modifications to some of the elements, demonstrating an entity match scenario. Here\u2019s an example:\n\nOriginal instance:\n\n```json\n{\n  \"id\": 255,\n  \"name\": \"whirlpool white front load washer wfw9200swh\",\n  \"description\": \"whirlpool duet wfw9200sq white front load washer wfw9200swh 4.0 cu . ft. capacity 6th sense technology quiet wash plus noise reduction built-in water heater add-a-garment feature sanitary cycle 4 temperature selections white finish\",\n  \"price\": null\n}\n```\n\nTarget instance:\n\n```json\n{\n  \"id\": 255,\n  \"name\": \"whirlpool 27 ' duet washer horiz axis wp\",\n  \"description\": null,\n  \"price\": 910.91\n}\n```\n\nIn some cases, the provided instance might be too ambiguous to transform. If this occurs, output an empty dictionary. For example:\n\nOriginal instance:\n\n```json\n{\n  \"id\": 4613,\n  \"game_publisher_id\": 207,\n  \"platform_id\": 7,\n  \"release_year\": 2006\n}\n```\n\nTarget instance:\n\n```json\n{}\n```\n\nThe reason is that the original instance relies on additional tables in the database to make a meaningful match, and the `release_year` cannot be changed.\n\nNow, I will provide you with another instance. Your goal is to create a target instance based on the following requirements:\n\n- Keep any id-like items unchanged. These id-like items include terms such as `student_id`, `teacher_id`, and `location_id`, or any similar identifiers. Ensure that these items are preserved exactly as they are in the original row.\n- Do not modify the original column names; keep them consistent.\n- Some elements in the target instance can be set to null if necessary. If any elements are null in the original instance, try to reasonably infer and fill them.\n- For long text fields, try to rewrite the text while keeping the meaning similar and the logic intact.\n- For numeric values, consider adding slight variations to the numbers to create subtle but noticeable differences.\n- If a field is difficult to change (e.g., year), you may retain it, as long as the overall instance still differs.\n- For instances that are too challenging to transform, provide a specific explanation and return an empty dictionary `{}` as the answer.\n\nHere\u2019s the next instance for you to work on:\n\n```json\n{row}\n```\n\nPlease follow these steps carefully and answer the question step by step.",
  "dataint_workflow_test": "You are a helpful assistant tasked with identifying whether any of the entity pairs below represent the same entity. For each pair, analyze carefully and determine if they are identical. There may be more than one correct match.\n\nChoices:\n\nA) {ca}\n\nB) {cb}\n\nC) {cc}\n\nD) {cd}\n\nE) None of the above pairs are correct.\n\nInstructions:\n\n1. Analyze each pair step by step.\n2. If the pair represents the same entity, mark it as correct.\n3. If none of the pairs are correct, choose option E.\n4. Conclude your response by stating the correct answer(s) on a separate line in the format: `Answer: ABCDE` (without spaces between the letters if selecting multiple options).",
  "chartPerception_workflow": "You are a helpful assistant tasked with generating visualization tasks based on a dataset containing multiple tables, along with their metadata.\n\n{datasetInfo}\n\nAssume that necessary libraries have been imported and all tables have been loaded into pandas DataFrames with the names specified above - {dfStmt} .\n\nExample Task: Generate a bar chart to visualize the number of rows in different tables.\n\nExample Code:\n\n```python\ntableNames = {tableNames}\ntableRows = {tableRows}\nplt.bar(tableNames, tableRows)\n```\n\n### Task Instructions:\n\nBased on the dataset, create a series of visualization tasks that utilize data from multiple tables. Ensure the tasks meet the following requirements:\n\n1. **Task Formatting:**  \n   Format each task as `Task n: task_description` and avoid splitting the task description into multiple lines.\n\n2. **Task Complexity:**  \n   Since the dataset is large, avoid generating overly dense visualizations. For example, if a chart requires more than 20 bars, reduce the scale by applying appropriate filters or data analysis techniques before visualizing.\n\n3. **Chart Types:**  \n   Limit your selection of chart types to bar charts, line charts, pie charts, and scatter charts. Keep the visualizations simple; for instance, avoid multiple lines in a line chart or generate multiple charts at once.\n\n4. **Code Requirements:**  \n   - Provide the corresponding Python code for each task.\n   - Format the code within triple backticks (` ```python ``` `) for clarity.\n   - Do not include any commands like `plt.show()` or `plt.savefig()` in the code.\n   - You can perform data preprocessing or filtering using pandas before creating the final chart using `df.plot()`.\n\n5. **Pre-imported Libraries:**  \n   Pandas has been imported as `pd`, and `matplotlib.pyplot` has been imported as `plt`. Do not re-import them.\n\n**Ensure that each task and its corresponding code meets the above requirements.**",
  "chartPerception_workflow_test": "You are a helpful assistant tasked with selecting the most suitable chart based on the provided dataset and visualization requirements. The dataset is as follows:\n\n{datasetInfo}\n\nVisualization requirement: {task}\n\nInstructions:\n\n1. Analyze the dataset and the visualization requirements step by step.\n2. Identify which chart type best meets these criteria.\n3. Conclude your response by stating the correct answer on a separate line in the format: `Answer: A/B/C/D` .",
  "multiExpEM_workflow": "You are an expert in data integration. I will provide you with a single row from a database table in JSON format, where the keys represent the column names and the values represent the corresponding data. Your task is to transform this data into a concise and clear natural language description of the row. Here is an example for reference.\n\nOriginal instance (From database `Book`) :\n\n```json\n{\n  \"id\": 114514,\n  \"author\": \"Harper Lee\",\n  \"book\": \"To Kill a Mockingbird\"\n}\n```\n\nTarget instance:\n\n```txt\nThe author Harper Lee wrote the book `To Kill a Mockingbird`.\n```\n\nYour task is to follow these guidelines:\n\n1. The output must be a single, clear, and concise paragraph, enclosed in \"```txt```\" .\n2. Exclude any data that does not contribute meaningfully to the description (e.g., omitting the \"id\" in the example).\n3. Always aim for a natural and coherent sentence structure that is easy to understand.\n\nNow, here is the new instance for you to process  (From database `{database}`) :\n\n```json\n{row}\n```\n\nPlease follow these instructions carefully and generate the description accordingly. Please analyze this question step by step.",
  "multiExpEM_workflow_test": "You are a helpful assistant tasked with identifying whether any of the entity pairs below represent the same entity. For each pair, analyze carefully and determine if they are identical. There may be more than one correct match.\n\nChoices:\n\nA) {ca}\n\nB) {cb}\n\nC) {cc}\n\nD) {cd}\n\nE) None of the above pairs are correct.\n\nInstructions:\n\n1. Analyze each pair step by step.\n2. If the pair represents the same entity, mark it as correct.\n3. If none of the pairs are correct, choose option E.\n4. Conclude your response by stating the correct answer(s) on a separate line in the format: `Answer: ABCDE` (without spaces between the letters if selecting multiple options).",
  "tableAugmentation_workflow_test": "You are a helpful assistant tasked with determining whether the given table can be joined with other tables based on common keys or relevant relationships. The dataset information is provided below:\n\n{datasetInfo}\n\nDetermine which of the following table(s) can be directly joined to the table `{tableName}` :\n\nChoices:\n\nA) {ca}\n\nB) {cb}\n\nC) {cc}\n\nD) {cd}\n\nE) None of the above tables can be joined.\n\nInstructions:\n\n1. Carefully examine the columns and relationships in each choice.\n2. Determine if there are common fields, foreign keys, or other relevant attributes that support joining.\n3. State your conclusion on a new line in the format: `Answer: ABCDE` (without spaces between letters if selecting multiple options)."
}