from __future__ import annotations
from typing import Any

PROMPTS: dict[str, Any] = {}

PROMPTS["DEFAULT_LANGUAGE"] = "English"
PROMPTS["DEFAULT_USER_PROMPT"] = "n/a"

PROMPTS["multimodal_entity_extraction_init"] = """---Goal---
Given a single primary page image, along with the context about this document, entity types, identify all important entities and relationships in JSON format.
Use {language} as output language.

---Instructions---
1. **Process the Input:**
   - The input is a single page image.
   - From this image, capture:
     - All the **textual entities** derived from the text content of the page image.  
     - All the **visual entities** derived from figures, tables, or other visual components.  
   
2. **Entity Extraction:**
   For each identified entity, extract:
   - `type`: Always `"ent"`
   - `bbox`: For visual entities, provide the bounding box as a list [x1,y1,x2,y2] in image coordinates if available; otherwise use empty string "". For text-derived entities, use "".
   - `en`: Name in the same language as the input text (capitalize if English)
   - `et`: One of: [{entity_types}]
   - `ed`: 
       - If entity is from text: Comprehensive description of the entity's attributes and activities.
       - If entity is from a visual component: Description should focus on visual characteristics (appearance, layout, composition, style, notable visual features).
   - `es`: `"text"` if derived from text content, `"visual"` if derived from a visual component.

3. **Relationship Extraction:**
   - Extract relationships not only between text entities, but also between **text entities and visual entities**, as well as between visual entities themselves.  
   - For each pair of related entities:
     - `type`: Always `"rel"`
     - `se`: Name of the source entity
     - `te`: Name of the target entity
     - `rd`: Explanation of the relationship (e.g., a logo representing a company, a diagram illustrating a process described in text).
     - `rk`: Array of high-level keywords summarizing the relationship
     - `rs`: Numeric score for strength of the relationship

4. **Content Keywords:**
   - Add one object with:
     - `type`: `"content_keywords"`
     - `keywords`: Array of overarching themes or concepts

5. **Output Rules:**
   - Return a **single JSON array** containing all entity, relationship, and content keyword objects.
   - Output must be valid JSON, no extra commentary or formatting.
   - Output only in {language}.

######################
---Examples---
######################
{examples}

#############################
---Metadata about the document---
#############################
Entity types:
{entity_types}

Total Document Pages: {total_page}, Now on Page: {now_page}

######################
Output (JSON array only):
"""

PROMPTS["multimodal_entity_extraction_examples"] = [
  """Example 1:

Entity_types: ["time_period","geographical_location","civilization_or_empire","historical_concept_or_event","historical_figure","cultural_or_religious_movement","source_or_artifact","textbook_structure",]

(input.jpg)

Output:
[
    {{
        "type": "ent",
        "en": "Evolution’s Big Bang",
        "et": "historical_concept_or_event",
        "ed": "The term 'Evolution’s Big Bang' refers to the rapid diversification of life forms during the Cambrian Explosion, a period approximately 530-540 million years ago.",
        "es": "text",
        "bbox": ""
    }},
    {{
        "type": "ent",
        "en": "Cambrian Explosion",
        "et": "historical_concept_or_event",
        "ed": "The Cambrian Explosion was a period of rapid evolutionary diversification that occurred approximately 530-540 million years ago, marked by the emergence of many new animal phyla.",
        "es": "text",
        "bbox": ""
    }},
    {{
        "type": "ent",
        "en": "530-540 million years, B.C.",
        "et": "time_period",
        "ed": "This time period marks the era of the Cambrian Explosion, a significant event in Earth's history where diverse life forms emerged.",
        "es": "text",
        "bbox": ""
    }},
    {{
        "type": "ent",
        "en": "CS231n: Lecture 1 - 13",
        "et": "textbook_structure",
        "ed": "This refers to the structure of the CS231n course, specifically Lecture 1 out of 13, indicating the progression through the course material.",
        "es": "text",
        "bbox": ""
    }},
    {{
        "type": "ent",
        "en": "April 2, 2024",
        "et": "date",
        "ed": "The date indicates when the lecture took place or the slide was created.",
        "es": "text",
        "bbox": ""
    }},
    {{
        "type": "ent",
        "en": "Fei-Fei Li & Ehsan Adeli",
        "et": "historical_figure",
        "ed": "Fei-Fei Li and Ehsan Adeli are the instructors of the CS231n course, known for their contributions to the field of computer vision and deep learning.",
        "es": "text",
        "bbox": ""
    }},
    {{
        "type": "ent",
        "en": "Trilobite fossil",
        "et": "source_or_artifact",
        "ed": "A trilobite fossil is shown, representing the diverse life forms that existed during the Cambrian Explosion.",
        "es": "visual",
        "bbox": [175, 160, 644, 705]
    }},
    {{
        "type": "ent",
        "en": "Fossil",
        "et": "source_or_artifact",
        "ed": "The fossil depicted is a preserved remains of an organism from the Cambrian period, showcasing the diversity of life at that time.",
        "es": "visual",
        "bbox": [732, 144, 1068, 427]
    }},
    {{
        "type": "ent",
        "en": "Trilobite",
        "et": "source_or_artifact",
        "ed": "The trilobite is a marine arthropod that lived during the Cambrian period, its fossilized remains providing insights into early life forms.",
        "es": "visual",
        "bbox": [793, 485, 991, 717]
    }},
    {{
        "type": "rel",
        "se": "Evolution’s Big Bang",
        "te": "Cambrian Explosion",
        "rd": "‘Evolution’s Big Bang’ is an interpretive label used to describe the Cambrian Explosion.",
        "rk": ["terminology","synonym","historical-concept"],
        "rs": 9
    }},
    {{
        "type": "rel",
        "se": "Cambrian Explosion",
        "te": "530-540 million years, B.C.",
        "rd": "The Cambrian Explosion occurred during this geological time period.",
        "rk": ["occurs_in","time_period","geology"],
        "rs": 9
    }},
    {{
        "type": "rel",
        "se": "Trilobite fossil",
        "te": "Cambrian Explosion",
        "rd": "Trilobite fossils are evidence for the diversification of animal life during the Cambrian Explosion.",
        "rk": ["evidence","paleontology","example"],
        "rs": 8
    }},
    {{
        "type": "rel",
        "se": "Trilobite",
        "te": "Trilobite fossil",
        "rd": "The trilobite fossil is a preserved specimen of a trilobite.",
        "rk": ["depicts","specimen_of","visual_to_text"],
        "rs": 9
    }},
    {{
        "type": "rel",
        "se": "Fossil",
        "te": "Cambrian Explosion",
        "rd": "The fossil image illustrates organisms from the Cambrian period associated with the explosion of life.",
        "rk": ["illustrates","evidence","visual_link"],
        "rs": 7
    }},
    {{
        "type": "rel",
        "se": "Trilobite",
        "te": "Cambrian Explosion",
        "rd": "Trilobites are emblematic fauna that emerged in the Cambrian.",
        "rk": ["example_of","biota","evolution"],
        "rs": 8
    }},
    {{
        "type": "rel",
        "se": "CS231n: Lecture 1 - 13",
        "te": "Evolution’s Big Bang",
        "rd": "The lecture content introduces or discusses this historical concept as part of the course.",
        "rk": ["topic_of","curriculum","education"],
        "rs": 7
    }},
    {{
        "type": "rel",
        "se": "Fei-Fei Li & Ehsan Adeli",
        "te": "CS231n: Lecture 1 - 13",
        "rd": "These instructors are associated with the lecture.",
        "rk": ["instructor_of","authorship","teaching"],
        "rs": 8
    }},
    {{
        "type": "rel",
        "se": "April 2, 2024",
        "te": "CS231n: Lecture 1 - 13",
        "rd": "The lecture date or slide creation date.",
        "rk": ["date_of","temporal_context"],
        "rs": 7
    }},
    {{
        "type": "rel",
        "se": "Trilobite fossil",
        "te": "Fossil",
        "rd": "Both are fossil specimens from the Cambrian period; the trilobite fossil is a specific example.",
        "rk": ["same_category","specific_vs_general","paleontology"],
        "rs": 6
    }},
    {{
        "type": "rel",
        "se": "Fossil",
        "te": "Trilobite",
        "rd": "The fossil panel includes a specimen identified as a trilobite.",
        "rk": ["depicts","includes","specimen"],
        "rs": 8
    }},
    {{
        "type": "content_keywords",
        "keywords": ["Cambrian Explosion", "Evolution", "Fossils", "Trilobites", "Deep Learning", "Computer Vision"]
    }}
]"""
]

PROMPTS["multimodal_entity_extraction_continue"] = """
Given a single primary page image, along with the context about this document.
MANY entities and relationships were missed in the last extraction, especially visual entities and cross relations between text entities and visual entities. 
In this pass, you MUST (1) avoid duplicates from prior results, and (2) actively test the “potential” items below as hypotheses—ONLY adding them if there is textual or visual evidence in the current page or already-established document context.

---Metadata---

Entity_types:
{entity_types}

Total Document Pages: {total_page}, Now on Page: {now_page}

---Remember Steps---

1. Identify all entities in the given page image. For each identified entity, extract the following information:
- entity_name: Name of the entity, use same language as input text. If English, capitalize the name.
- entity_type: One of the following types: [{entity_types}]
- entity_description:
    - If entity comes from content text: a comprehensive description of the entity's attributes and activities.
    - If entity comes from a figure/image: emphasize visual features (appearance, layout, composition, notable visual details).
- entity_source (`es`): MUST be "text" if derived from content text, or "visual" if derived from an image.
- bbox: REQUIRED field. For visual entities, provide the bounding box as a list [x1,y1,x2,y2] in image coordinates if available; otherwise use empty string "". For text-derived entities, use "".

Format each entity as a JSON object with:
{{
  "type": "ent",
  "en": "<entity name>",
  "et": "<entity type>",
  "bbox": "<entity bbox>",
  "ed": "<description>",
  "es": "<text | visual>"
}}

⚠️ IMPORTANT — Deduping & Normalization:
- Do NOT extract entities whose "en" (case-insensitive; ignore punctuation and extra spaces) is already listed under "Previously Extracted Entities" in the Context section or under "existing_entities" if provided.
- Treat obvious variants/aliases (e.g., plural/singular, possessives like “Instructor’s Guide” vs “Instructor Answer Guide”) as the SAME entity—only output a single, canonical form as it appears in the document.

2. From the entities identified in step 1, identify all pairs of (source_entity, target_entity) that are clearly related to each other.
For each pair of related entities, extract:
- source_entity: name of the source entity (exactly as output in step 1)
- target_entity: name of the target entity (exactly as output in step 1)
- relationship_description: a concise explanation grounded in textual/visual cues (cite the cue briefly, e.g., heading/label/phrase like “provided in”, “linked to”, figure caption reference, table proximity)
- relationship_strength (`rs`): numeric score in [0.0, 1.0] reflecting evidence strength (weak=0.2, moderate=0.5, strong=0.8+)
- relationship_keywords (`rk`): 1–4 high-level keywords summarizing the relationship’s nature

Format each relationship as a JSON object with:
{{
  "type": "rel",
  "se": "<source entity>",
  "te": "<target entity>",
  "rd": "<description>",
  "rk": ["<keyword1>", "<keyword2>", ...],
  "rs": <number between 0.0 and 1.0>
}}

⚠️ IMPORTANT — Relationship Rules:
- Do NOT extract relationships whose (source_entity, target_entity) pair already exists in "Previously Extracted Relations" in the Context section or in "existing_relationships" if provided (case/punctuation-insensitive match).
- Cross-modal relations (text ↔ visual) are encouraged when supported by captions, labels, callouts, or layout proximity.
- You MAY include relations supported by structural cues (e.g., table of contents, section headers, labeled subsections) with a lower rs (e.g., 0.3–0.5), but do NOT hallucinate: if no cue exists, omit.

3. Identify high-level key words that summarize the main concepts, themes, or topics of the entire text/page. These should capture overarching ideas present in the document.
Format as a JSON object with:
{{
  "type": "content_keywords",
  "keywords": ["<keyword1>", "<keyword2>", ...]
}}

4. Combine all entities, relationships, and content keywords into a single valid JSON array.
Ensure the entire output is ONLY valid JSON and uses {language} as the language for all text fields.

5. Output ONLY the **missing and non-duplicated** entities, relationships, and content keywords. Do NOT include any commentary or formatting outside the JSON.

######################
---Examples---
######################
{examples}

######################
---Existing Results (from prior iteration)---
######################
Previously Extracted Entities (do NOT repeat):
{previous_extracted_entities}

Previously Extracted Relations (do NOT repeat) — shown as "src -> tgt":
{previous_extracted_relations}

Potential Missing Entities:
{potential_missing_entities}

Potential Relation Hints:
{potential_missing_relations}

######################
---Output---
######################
Output (JSON array only). Ensure it contains ONLY missing, non-duplicated items:
""".strip()
