import pandas as pd
import openai, os, json
import tiktoken

from langchain.chat_models import AzureChatOpenAI
from langchain import LLMChain

PATH = "F:/user-repos/saurasrivastava/prompt_gen"
PATH_DATA = f"{PATH}/data"


f = open(f"{PATH_DATA}/prompt_type.txt")
classes=[lines.strip().replace("_", " ").replace("][", ", ").replace("[", "").replace("]", "") for lines in f]
# split prompts into a list
f = open(f"{PATH_DATA}/good.txt", encoding = "utf-8")
prompts = [lines.strip() for lines in f]

#all_classes = set([xx for x in classes for xx in re.findall("\[.*?\]", x)])

# specific output types - not every prompt would fall into all of these types so this is a
#    filtering mechanism
output_prompt = """Your task is to analyze the following prompt and their prompt types and suggest modifications if any. For example, after analyzing the prompt you can either suggest to remove or add a prompt type to the prompt type. You will be given a set of 111 examples, for each examples please analyze the text in full and reason why they have been assigned a prompt type. If you feel like there is a prompt type not needed please remove it or if you think something is missing please add a prompt type. Also, remember that you can generate a new prompt class if needed. If you generate a new prompt type, please provide a description of the class at the end. Next, I will provide the definition of existing prompt types for you reference. Remember you can add or remove or generate a new prompt type after analyzing the prompts critically.

1 [CLARIFICATION]: Such prompts have specified that the system should ask for a clarification question if needed.
2 [CODE_OUTPUT]: Such prompts involve code generation. The output generated by the prompt should be parsable, structured and formal.
3 [CONCATENATION]: Such prompts involve some type of concatenation operation on data structures.
4 [CONSTRAINED_OUTPUT]: Such prompts asks for a constrained output such as 1) while generating a code output constraint on function name, 2) number of sentences to generate while content generation, 3) following a certain criteria while generating outputs (such as if-else structures, etc.).
5 [CONTENT_GENERATION]: Such prompts involve some type of content generation such as dataset, poem or story generation, etc. This is different from code output as code output is a formal generation process and is structured. Content generation on the other hand could be unstructured and should focus on creativity, and other form of nuances while writing something.
6 [CONTEXT]: Such prompts informs that one should expect some type of contextual information such as partial state of a graph, partial text which has been divided into chunks or partial dialogue state. While generating such prompts, it must be understood that previous information is reuired to generate the correct output.
7 [DATE_UNDERSTANDING]: Such prompts should encourage one to apply reasonings that involves date related operation such as identifying a leap year, keeping in mind the number of dates in each months, etc.
8 [DEDUCTIVE_REASONING]: Such prompts encurage model to start with general principles and apply them to a specific instance given in the prompt.
9 [FORMATTED_OUTPUT]: Such prompt encourages one to generate output in a strcutured format such as JSON, tables, etc. so that it becomes eaasier to extract output by performing some form of parsing techniques.
10 [GRAPH]: Such prompts involve some form of graph related operations.
11 [INFORMATION_EXTRACTION]: Such prompts encurages a system to analyze the input and extract the information which has been specified in the text. The information requested could be named entities, dates, location, etc.
12 [INPUT MANIPULATION]: This prompt encourages one to perform some type of manipulation on the input. Such as given a data structure add or remove an element. Or also it can ask one to add or remove some information from the input text. Any type of manipulation on input is allowed.
13 [MATHEMATICAL_REASONING]: Such prompts involve some type of mathematical reasoning and to answer such queries one need to think stepby step, or break down the problem into smaller steps and then combine the solution to get the answer. Mathematical Reasoning prompts could involve other mathematical queries too such as algebraic system, high school mathematical questions, etc. To solve them a prompt would encourage one to divide and conquer, provide step by step explanation, and then answer the query.
14 [MULTI_HOP_QUERY]: Such prompts involves queries that requires multiple information retrieval and then combining their output to reason and provide output. For such prompts, one should retrieve as many facts as possible from their memory and then provide the output. It should be noted that while generating the information from memory one should not retrieve made-up facts or hallucinations and only retrieve information if someone is confident on their knowledge.
15 [MULTI_TURN_PROMPT]: Such prompts break down a candidate prompt into multiple prompts so that it becomes easier for one to solve an underlying task. Such prompts first solve a smaller portion of a query, and then generate and use the output iteratively to solve a complex task.
16 [PARAPHRASE]: Such prompt involve manipulation on input to encourage generation of text which is similar to the input. It is different from input manipualtion as paraphrasing may involve generating content which can extract information from the input (such as dates, names, etc) or generating a new content in whole, or a combination of both.
17 [PATTERN_IDENTIFICATION]: Such prompts are given some input-output pairs and models are encouraged to identify the patterns, transformation function, etc to obtain output from input. Such prompts are very tough and should include as many details possible to generate an effective prompt.
18 [PLANNING]: Such prompts involve some type of planning before generating content. Planning can involve anaylzing the environment or the constraints given in input and design, apply critical thinking and reason about to break down the problem, and then generate the output. Such prompts are useful in generating content which are useful for robots, mechanical devices, etc.
19 [RESPONSE_GENERATION]: Such prompt encourage a dialogue response generation. It analyzes the information before and then generates a output. It is different than CONTEXT class as we can specify certain constraints also in this class. 
20 [ROLE_PLAYING]: This involves some type of role-playing or assigning a character to models. This could provide a persona to the model while generating output and can be useful for target audience.
21 [STATE TRACKING]: In the context of conversational AI, refers to the system's ability to remember and manage context-specific information throughout a conversation. This could include the user's inputs, previous responses by the AI, and any other data relevant to the current dialogue. It's crucial for maintaining context over multiple turns of a conversation, allowing the AI to provide coherent and contextually appropriate responses.
22 [STRATEGY QUESTION ANSWERING]: Such prompts involve identifying the queries provided in the input, extract logical information or facts for the elemts in the input and then apply either the mathematical or logical reasoning to answer a user query.
23 [SUMMARIZATION]: Such prompt encourages one to summarize the input. It is different than paraphrasing as it involves shortening the input and is different from input manipulation as the input here is strictly unstructured text.
24 [TRANSLATION]: Such prompts involves translating from one language to other. 
25 [ANALYSIS]: Such prompts encourages a model to perform some type of analysis before generating output or the content. While analyzing the input the model should pay close attention apply reasoning, critical thinking before output.

Now, once you have understood the prompt type, analyze the following examples to understand why each prompt has been assigned specific prompt type. Each propmt can have multiple prompt type. You should output whether each prommpt has been assigned correct prompt type or not or if there's something missing or if a prompt type has to be removed. You can also generate a new prompt type if required and then add it to any of the examples below.

To this, you should first go through all the definitions above and all the examples below and then do modifications to the prompt type only if required. Remember to provide the definition of the new prompt types you create. Also remember, I require output for all the 111 exampples. Enumerate them one by one. Even if there is no modification is required provide an examplanation. So for each example, give exalanation if you added, removed, generated a new prompt type, or left it unchanged.

###Examples###
"""

# create giant mega-prompt listing the prompt and the types it falls under
for p, c in zip(prompts, classes):
    output_prompt += "###Prompt###\n" + p + "\n"
    output_prompt += "###Prompt Type###\n" + c + '\n\n'


print(output_prompt)


delimiter = "###Better Prompt###"

openai.api_key = ""
openai.api_type = ""
openai.api_base =  ""
openai.api_version = ""

enc = tiktoken.encoding_for_model("gpt-4")

# tokenize the output prompt
guidelines_len = len(enc.encode(output_prompt))
print("Length: ", guidelines_len)

llm = AzureChatOpenAI(deployment_name="gpt-4-32k",
                      model_name="gpt-4-32k", 
                      openai_api_key = openai.api_key, 
                      openai_api_base = openai.api_base, 
                      openai_api_version = openai.api_version)

response = llm.predict(''.join(output_prompt))
#print(response)

