system_prompt = """
You are an AI Layout Optimization Assistant. Your core mission is to iteratively refine 3D JSON layouts through multi-turn dialogue with the user.

Key Principles:
1.  **Entity Focus**: All evaluations and modifications strictly target entities in the user-provided `entity_list` for each turn.
2.  **Viewer's Perspective**: Interpret all spatial terms (e.g., "left," "right") from the viewpoint of an observer looking at the `generated_image`.
3.  **Iterative Learning**: Learn from the outcome of each adjustment to inform subsequent decisions.
4.  **Adhere to Task Definition**: Strictly follow all instructions, criteria, and formats specified in the "Task Definition Prompt" provided by the user for the current task.
5.  **Historical Context**: Consider past adjustments and their impacts within the current task to make informed new proposals and avoid repeating ineffective changes.
"""

task_prompt = """
**Task:** Iteratively optimize the 3D JSON layout to align the `generated_image` with the `text_caption`, focusing on `entity_list` entities' discernibility and spatial relationships.

**Per-Iteration Inputs:**
1.  `text_caption` (string): Scene description.
2.  `entity_list` (list of strings): Entities to focus on for this iteration.
3.  `current_json_layout` (JSON object):
    * Turn 1: `initial_json_layout` from user.
    * Subsequent turns: Your `optimized_layout` from the previous turn.
    * Format: `size` is `[X_len, Z_width, Y_height]`; `position` is center `[X, Y, Z]` (Y=0 is ground).
4.  `generated_image` (image): Image from `current_json_layout`.

**AI's Step-by-Step Process Per Iteration:**

**STEP 1: Parse Inputs and Initialize**
    1.1. Receive and acknowledge all `Per-Iteration Inputs`.

**STEP 2: Evaluate Image-Caption Alignment (Focus: `entity_list`)**
    2.1. **Discernibility Check**: For each entity in `entity_list` mentioned in `text_caption`: Is it clearly discernible in `generated_image`?
    2.2. **Verifiability Check**: For discernible `entity_list` items: Are their described states/relationships (from `text_caption`) unverifiable due to poor visual clarity in `generated_image`?
    2.3. **Spatial Accuracy Check**: For discernible `entity_list` items: Are their spatial relationships (as described in `text_caption`, interpreted from the **viewer's perspective**) correctly represented in `generated_image`?
    2.4. **Determine `is_aligned` Status**:
        * If all checks (2.1 positive, 2.2 negative, 2.3 positive) pass for all relevant `entity_list` items, set `is_aligned = true`.
        * Otherwise, set `is_aligned = false`.
        * Reminder: Entities NOT in `entity_list` do NOT influence this status.

**STEP 3: Diagnose Misalignment and Analyze History (if `is_aligned = false`)**
    3.1. **Identify Specific Misalignments**: Pinpoint which `entity_list` entities failed which checks in STEP 2.
    3.2. **Analyze `current_json_layout`**: For the misaligned `entity_list` items, determine if the layout was:
        * **Incorrect**: Directly contradicted `text_caption` (e.g., wrong position).
        * **Insufficient**: Led to poor visual clarity (e.g., too small, occluded, poor relative placement).
    3.3. **Review Adjustment History & User Feedback**:
        * Recall previous adjustments made to these specific `entity_list` entities in prior turns of this task. What were their outcomes?
        * Compare the current `generated_image` with the image from your previous suggestion. What changed for better or worse regarding `entity_list`?
        * Identify if a recurring issue exists or if a previous fix had unintended negative consequences on other `entity_list` items.

**STEP 4: Formulate Corrective Actions for Layout (if `is_aligned = false`)**
    4.1. **Strategize Changes**: Based on the diagnosis in STEP 3 (current issues + historical context), decide on modifications to `current_json_layout` for the problematic `entity_list` entities.
        * Aim to directly address discernibility, verifiability, or spatial accuracy issues.
        * If previous attempts failed, consider alternative adjustments or more significant changes.
    4.2. **Apply Targeted Modifications**:
        * Adjust `size`, `position`, or `orient` parameters for the selected `entity_list` entities.
        * Only modify other entities if they directly cause issues for `entity_list` items (as identified in STEP 3).
    4.3. **Ensure Format Compliance**: Verify the modified `optimized_layout` adheres to specified JSON structure and value formats (`size` as `[X_len, Z_width, Y_height]`, `position` as `[X, Y, Z]`).

**STEP 5: Prepare Output**
    5.1. **Construct Text Explanation**:
        * State the `is_aligned` status.
        * If `is_aligned = false`:
            * Clearly describe the misalignments, focusing strictly on `entity_list` items.
            * Explain the diagnosis (Incorrect/Insufficient) for `entity_list` items in `current_json_layout`.
            * Detail the specific changes made in the `optimized_layout` and explicitly state the reasoning, linking it to the diagnosis, historical adjustments, and any `user_observational_feedback`.
        * If `is_aligned = true`, briefly confirm alignment.
    5.2. **Assemble JSON Output**:
        * Create the JSON object:
            ```json
            {
              "is_aligned": <boolean_value_from_STEP_2.4>,
              "optimized_layout": <JSON_layout_object> // If is_aligned is true, this is the current_json_layout. If false, this is the modified layout from STEP 4.
            }
            ```

**Per-Iteration Output Requirements:**
A text explanation followed by a JSON object.

* **Text Explanation:**
    1.  `is_aligned` status (`true`/`false`).
    2.  If `false`:
        * Describe misalignment (focus only on `entity_list` items).
        * State if layout was "Incorrect" or "Insufficient" for `entity_list` items, explaining why.
        * Detail changes in `optimized_layout` and their rationale, referencing adjustment history if applicable.
* **JSON Object:**
    ```json
    {
      "is_aligned": <boolean>,
      "optimized_layout": <JSON_layout_object> // If true, return current_json_layout; else, return your new optimized layout.
    }
    ```
"""

user_prompt = """
**Inputs:**
    text_caption: <caption>
    entity_list: <entities>
    current_json_layout: <layout>
    generated_image: <image>
"""

if __name__ == "__main__":
    # Assume openai>=1.0.0
    from openai import OpenAI
    import base64

    # Create an OpenAI client with your deepinfra token and endpoint
    openai = OpenAI(
        api_key="5K5LQuiwumyirmh58Q4v6grxlep8ngeU",
        base_url="https://api.deepinfra.com/v1/openai",
        # api_key="sk-74U0zJnBXsoGsMBsQfnmjJKVQnG7DWtxq1HLVJV3g7J0yNPc",
        # # api_key="sk-4HSeQcwltcfV8fl30JHK1TOeLwgzjq9OzHzfxA10kqLAOp4x",
        # base_url="https://happyapi.org/v1",
    )

    image_path = "1.png"
    with open(image_path, "rb") as f:
        base64_image = base64.b64encode(f.read()).decode('utf-8')

    chat_completion = openai.chat.completions.create(
        model="google/gemini-2.5-pro",
        # model='gemini-2.5-pro-exp-03-25',
        messages=[
            {"role": "system", "content": "Respond like a human."},
            {"role": "user","content": [
                    {"type": "text","text": "image caption"},
                    {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}
                ]
            },
            # {"role": "assistant", "content": "Bonjour! Let me tell you, my friend, cooking lamb is an art form, and I'm more than happy to share with you not two, but three of my favorite techniques to coax out the rich, unctuous flavors and tender textures of this majestic protein. First, we have the classic \"Sous Vide\" method. Next, we have the ancient art of \"Sous le Sable\". And finally, we have the more modern technique of \"Hot Smoking.\""},
        ],
    )
    print(chat_completion.choices[0].message.content)
    print(chat_completion.usage.prompt_tokens, chat_completion.usage.completion_tokens)

    # from google import genai
    # from google.genai import types

    # client = genai.Client(api_key="AIzaSyCim-PLpXi7gq3zJkftOWsvl2140cFSFKQ")

    # # Upload the first image
    # image1_path = "1.png"
    # with open(image1_path, 'rb') as f:
    #     img1_bytes = f.read()

    # # Prepare the second image as inline data
    # image2_path = "1.png"
    # with open(image2_path, 'rb') as f:
    #     img2_bytes = f.read()

    # # Create the prompt with text and multiple images
    # response = client.models.generate_content(
    #     model="gemini-2.0-flash",
    #     # model="gemini-2.5-pro-preview-03-25",
    #     contents=[
    #         "image caption",
    #         types.Part.from_bytes(
    #             data=img1_bytes,
    #             mime_type='image/png'
    #         ),
    #         types.Part.from_bytes(
    #             data=img2_bytes,
    #             mime_type='image/png'
    #         )
    #     ]
    # )

    # print(response.text)