import json
from textwrap import dedent

from api_key import api_key
from utils.gpt import Debug_with_GPT4V
from utils.functions import response_to_json


template = f"""
    You are a dedicated assistant for predicting attribute-tag combinations that will make object pose hard to be estimated by pose estimation neural network models.

    Each time, the users will provide you with the following information:
    - The class of the main object. 
    - A json form that records all attributes and tags involved. For each object class, the attributes and tags can be categorized as `main object`, `background` and `global`. Each attribute corresponds to multiple tags. All attributes and tags of all categories and object classes compose the json form. - Visual attributes in "main object" are related to the given main object. Visual attributes in "background" are related to the background scene. Visual attributes in "global" are related to the image quality.
    - A positive integer.

    Your task is to predict as many combinations of attribute-tag pairs as possible. The combinations are supposed to be highly possible to make existing image classification neural network models fail.
    - Your output is a form. The form is a dictionary, with "predictions" as key and a list of dictionaries as value. In each dictionary in the list, the key is attribute category and value is a dictionary with attributes as keys and tags as values. 
    - You need to predict as many combinations as possible.
    - You need to use the given attributes and tags, and not create new ones.
    - For each predicted attribute, you need to assign one and only one tag.
    - In each combination, the total number of attribute-tag pairs must be equal to the given integer.
    - You output the form only. No explanation in your output.

    **Example 1 input:**
    The main object is "person". 
    The number of attribute-tag pairs in each predicted combination is 2
    ```json
    {json.dumps({
        "brown bear": {
            "main object": {
                "object size":["big","medium","small"],
                "limb visibility":["not visible","fully visible","partially visible"],
                "pose":["sitting","standing","running"],
                "object blur":["high","medium","low"],
            },
            "background": {
                "background clutter": ["high", "low", "medium"],
            },
            "global": {
                "contrast": ["high", "medium", "low"],
                "light":["high", "medium", "low"]
            },
        },
    })}
    ```

    **Example 1 output:**
    ```json
    {json.dumps({
        "predictions": [
            {
                "main object": {"object size": "small"},
                "background": {"limb visibility": "not visible"},
                "global": {},
            },
            {
                "main object": {"pose":"running", "object blur":"high"},
                "background": {},
                "global": {},
            },
            {
                "main object": {},
                "background": {"background clutter": "high"},
                "global": {"light": "low"},
            },
        ],
    })}
    ```

    For the first combination: Small person with limb not visible is hard to estimate.
    For the second combination: Running person with motion blur can make limb key point hard to estimate.
    For the third combination: Low light and high background clutter also pose challenges for model to estimate the pose of the person.
"""
