import json
from tqdm import tqdm
from transfer_openai import send_request

with open('../prompts/roles.json', 'r') as f:
    roles = json.load(f)

temperatures = [0.7, 0.9]
role_names = ['Critic', 'General Public', 'Engineer']
role_descriptions = ['You are now Critic, one of the referees in this task. You will check fluent writing, clear sentences, and good wording in summary writing. Your job is to question others judgment to make sure their judgment is well-considered and offer an alternative solution if two responses are at the same level.', 'You are now General Public, one of the referees in this task. You are interested in the story and looking for updates on the investigation. Please think critically by yourself and note that it’s your responsibility to choose one of which is the better first.', 'You are Scientist, one of the referees in this task. You are a professional engaged in systematic study who possesses a strong background in the scientific method, critical thinking, and problem-solving abilities. Please help other people to determine which response is the better one.']
    
with open('../benchmark/final_version/psy.json', 'r') as f:
    dataset = json.load(f)
    
for data in tqdm(dataset):
    question = data['question']
    context = data['context']
    reference = data['reference']
    answer_a = data['student_answer_a']
    answer_b = data['student_answer_b']
    prompt_begin = f'''
[Context]: 
{context}
[Question]: 
{question}
[Reference]: 
{reference}
[Answer A]: 
{answer_a}
[Answer B]: 
{answer_b}
[Role]:
[role]
[System]:
You have received feedback from other referees who have analyzed the responses. Your task is to review this feedback critically and either support or challenge the previous judgments.
Consider if the previous decisions were made correctly. Reflect on whether the initial judgments align with the reference provided. If you disagree with the previous analysis, provide a new perspective and rationale.
Remember, your goal is to ensure the most accurate and fair judgment.
Now it’s your time to talk, please make your talk short and clear, []! You should return with ``My decision is a`` or ``My decision is b`` or ``My decision is tie`` with short and clear explanations.
'''
    prompt_later = f'''
[Context]: 
{context}
[Question]: 
{question}
[Reference]: 
{reference}
[Answer A]: 
{answer_a}
[Answer B]: 
{answer_b}
[Role]:
[role]
[System]:
We would like to request your feedback on the performance of two assistants in response
to the user question displayed above based on the reference provided.
You should decide which one is better based on the reference. You should be critical and your opinion can not be exactly the same as others.
You have received feedback from other referees who have analyzed the responses. Your task is to review this feedback critically and either support or challenge the previous judgments.
Consider if the previous decisions were made correctly. Reflect on whether the initial judgments align with the reference provided. If you disagree with the previous analysis, provide a new perspective and rationale.
Remember, your goal is to ensure the most accurate and fair judgment. You should first analyze the previous feedback and then provide your own feedback.
Here is your discussion history:
[History]
Now it’s your time to talk, please make your talk short and clear, []! You should return with ``My decision is a`` or ``My decision is b`` or ``My decision is tie`` with short and clear explanations.
''' 
    turn = 1
    role_nums = 3
    history = ""
    for i in range(turn):
        temperature = temperatures[i]
        for j in range(role_nums):
            if i == 0 and j == 0:
                prompt = prompt_begin.replace('[]', f'{role_names[j]}').replace('[role]', role_descriptions[j])
            else:
                prompt = prompt_later.replace('[History]', history).replace('[]', f'{role_names[j]}').replace('[role]', role_descriptions[j])
            response = send_request(prompt, model='gpt-3.5-turbo', temperature=temperature)
            while response is None:
                response = send_request(prompt, model='gpt-3.5-turbo', temperature=temperature)
            history += f'{role_names[j]}:\n {response}\n'
    answer = response.split('My decision is ')[1].strip()
    data['ChatEval'] = answer[0].lower()
    data['ChatEval_history'] = history
    with open('../benchmark/final_version/psy_chateval.json', 'w') as f:
        json.dump(dataset, f, indent=4)


    
    


