# Template for self correction tasks --> parse the prompt
spot_object_template = """# Your Role: Excellent Parser

## Objective: Analyze scene descriptions to identify objects and their attributes.

## Process Steps
1. Read the user prompt (scene description).
2. Identify all objects mentioned with quantities.
3. Extract attributes of each object (color, size, material, etc.).
4. Ignore facing attribute (facing to left, facing to right, facing forward)
5. If the description mentions objects that shouldn't be in the image, take note at the negation part.
6. Explain your understanding (reasoning) and then format your result (answer / negation) as shown in the examples.
7. Importance of Extracting Attributes: Attributes provide specific details about the objects. This helps differentiate between similar objects and gives a clearer understanding of the scene.

## Examples

- Example 1
    User prompt: A brown horse is beneath a black dog. Another orange cat is beneath a brown horse.
    Reasoning: The description talks about three objects: a brown horse, a black dog, and an orange cat. We report the color attribute thoroughly. No specified negation terms. No background is mentioned and thus fill in the default one.
    Objects: [('horse', ['brown']), ('dog', ['black']), ('cat', ['orange'])]
    Background: A realistic image
    Negation: 

- Example 2
    User prompt: There's a white car and a yellow airplane in a garage. They're in front of two dogs and behind a cat. The car is small. Another yellow car is outside the garage.
    Reasoning: The scene has two cars, one airplane, two dogs, and a cat. The car and airplane have colors. The first car also has a size. No specified negation terms. The background is a garage.
    Objects: [('car', ['white and small', 'yellow']), ('airplane', ['yellow']), ('dog', [None, None]), ('cat', [None])]
    Background: A realistic image in a garage
    Negation: 

- Example 3
    User prompt: A car and a dog are on top of an airplane and below a red chair. There's another dog sitting on the mentioned chair.
    Reasoning: Four objects are described: one car, airplane, two dog, and a chair. The chair is red color. No specified negation terms. No background is mentioned and thus fill in the default one.
    Objects: [('car', [None]), ('airplane', [None]), ('dog', [None, None]), ('chair', ['red'])]
    Background: A realistic image
    Negation: 

- Example 4
    User prompt: An oil painting at the beach of a blue bicycle to the left of a bench and to the right of a palm tree with five seagulls in the sky.
    Reasoning: Here, there are five seagulls, one blue bicycle, one palm tree, and one bench. No specified negation terms. The background is an oil painting at the beach.
    Objects: [('bicycle', ['blue']), ('palm tree', [None]), ('seagull', [None, None, None, None, None]), ('bench', [None])]
    Background: An oil painting at the beach
    Negation: 

- Example 5
    User prompt: An animated-style image of a scene without backpacks.
    Reasoning: The description clearly states no backpacks, so this must be acknowledged. The user provides the negative prompt of backpacks. The background is an animated-style image.
    Objects: [('backpacks', [None])]
    Background: An animated-style image
    Negation: backpacks

- Example 6
    User Prompt: Make the dog a sleeping dog and remove all shadows in an image of a grassland.
    Reasoning: The user prompt specifies a sleeping dog on the image and a shadow to be removed. The background is a realistic image of a grassland.                                                                                                                              
    Objects: [('dog', ['sleeping']), ['shadow', [None]]]                                                                                                      
    Background: A realistic image of a grassland                                                                                                              
    Negation: shadows

- Example 7
    User Prompt: A fire hydrant is back of a cat relative to observer. The cat is facing away from the observer.
    Reasoning: Two objects are described: one fire hydrant, and a cat. No specified negation terms. No background is mentioned and thus fill in the default one.                                                                                                                          
    Objects: [('fire hydrant', [None]), ['cat', [None]]]                                                                                                      
    Background: A realistic image                                                                                                            
    Negation: shadows

Your Current Task: Follow the steps closely and accurately identify objects based on the given prompt. Ensure adherence to the above output format.

"""

# Template for self correction tasks --> adjust the bounding boxes
spot_difference_template = """# Your Role: Expert Bounding Box Adjuster

## Objective: Manipulate bounding boxes in square images according to the user prompt while maintaining visual accuracy.

## Bounding Box Specifications and Manipulations
1. Image Coordinates: Define square images with top-left at [0, 0] and bottom-right at [1, 1].
2. Box Format: [Top-left x, Top-left y, Width, Height]
3. Operations: Include addition, deletion, repositioning, and attribute modification.

## Key Guidelines
1. Alignment: Follow the user's prompt, keeping the specified object count and attributes. Deem it deeming it incorrect if the described object lacks specified attributes.
2. Boundary Adherence: Keep bounding box coordinates within [0, 1].
3. Minimal Modifications: Change bounding boxes only if they don't match the user's prompt (i.e., don't modify matched objects).
4. Overlap Reduction: Minimize intersections in new boxes and remove the smallest, least overlapping objects.

## Process Steps
1. Interpret prompts: Read and understand the user's prompt.
2. Implement Changes: Review and adjust current bounding boxes to meet user specifications.
3. Explain Adjustments: Justify the reasons behind each alteration and ensure every adjustment abides by the key guidelines.
4. Output the Result: Present the reasoning first, followed by the updated objects section, which should include a list of bounding boxes in Python format.

## Examples

- Example 1
    User prompt: A realistic image of landscape scene depicting a green car parking on the left of a blue truck, with a red air balloon and a bird in the sky
    Current Objects: [('green car #1', [0.027, 0.365, 0.275, 0.207]), ('blue truck #1', [0.350, 0.368, 0.272, 0.208]), ('red air balloon #1', [0.086, 0.010, 0.189, 0.176])]
    Reasoning: To add a bird in the sky as per the prompt, ensuring all coordinates and dimensions remain within [0, 1].
    Updated Objects: [('green car #1', [0.027, 0.365, 0.275, 0.207]), ('blue truck #1', [0.350, 0.369, 0.272, 0.208]), ('red air balloon #1', [0.086, 0.010, 0.189, 0.176]), ('bird #1', [0.385, 0.054, 0.186, 0.130])]

- Example 2
    User prompt: A realistic image of landscape scene depicting a green car parking on the right of a blue truck, with a red air balloon and a bird in the sky
    Current Output Objects: [('green car #1', [0.027, 0.365, 0.275, 0.207]), ('blue truck #1', [0.350, 0.369, 0.272, 0.208]), ('red air balloon #1', [0.086, 0.010, 0.189, 0.176])]
    Reasoning: The relative positions of the green car and blue truck do not match the prompt. Swap positions of the green car and blue truck to match the prompt, while keeping all coordinates and dimensions within [0, 1].
    Updated Objects:  [('green car #1', [0.350, 0.369, 0.275, 0.207]), ('blue truck #1', [0.027, 0.365, 0.272, 0.208]), ('red air balloon #1', [0.086, 0.010, 0.189, 0.176]), ('bird #1', [0.485, 0.054, 0.186, 0.130])]

- Example 3
    User prompt: An oil painting of a pink dolphin jumping on the left of a steam boat on the sea
    Current Objects: [('steam boat #1', [0.302, 0.293, 0.335, 0.194]), ('pink dolphin #1', [0.027, 0.324, 0.246, 0.160]), ('blue dolphin #1', [0.158, 0.454, 0.376, 0.290])]
    Reasoning: The prompt mentions only one dolphin, but two are present. Thus, remove one dolphin to match the prompt, ensuring all coordinates and dimensions stay within [0, 1].
    Updated Objects: [('steam boat #1', [0.302, 0.293, 0.335, 0.194]), ('pink dolphin #1', [0.027, 0.324, 0.246, 0.160])]

- Example 4
    User prompt: An oil painting of a pink dolphin jumping on the left of a steam boat on the sea
    Current Objects: [('steam boat #1', [0.302, 0.293, 0.335, 0.194]), ('dolphin #1', [0.027, 0.324, 0.246, 0.160])]
    Reasoning: The prompt specifies a pink dolphin, but there's only a generic one. The attribute needs to be changed.
    Updated Objects: [('steam boat #1', [0.302, 0.293, 0.335, 0.194]), ('pink dolphin #1', [0.027, 0.324, 0.246, 0.160])]

- Example 5
    User prompt: A realistic photo of a scene with a brown bowl on the right and a gray dog on the left
    Current Objects: [('gray dog #1', [0.186, 0.592, 0.449, 0.408]), ('brown bowl #1', [0.376, 0.194, 0.624, 0.502])]
    Reasoning: The leftmost coordinate (0.186) of the gray dog's bounding box is positioned to the left of the leftmost coordinate (0.376) of the brown bowl, while the rightmost coordinate (0.186 + 0.449) of the bounding box has not extended beyond the rightmost coordinate of the bowl. Thus, the image aligns with the user's prompt, requiring no further modifications.
    Updated Objects: [('gray dog #1', [0.186, 0.592, 0.449, 0.408]), ('brown bowl #1', [0.376, 0.194, 0.624, 0.502])]

Your Current Task: Carefully follow the provided guidelines and steps to adjust bounding boxes in accordance with the user's prompt. Ensure adherence to the above output format.

"""


image_edit_template = """# Your Role: Expert Bounding Box Adjuster

## Objective: Manipulate bounding boxes in square images according to user instructions while maintaining visual accuracy and avoiding boundary exceedance.

## Bounding Box Specifications and Manipulations
1. Image Coordinates: Define square images with top-left at [0, 0] and bottom-right at [1, 1].
2. Box Format: [Top-left x, Top-left y, Width, Height]
3. Operations: Include addition, deletion, repositioning, and attribute modification.

## Key Guidelines
1. Alignment: Follow the user's prompt, keeping the specified object count and attributes. Deem it deeming it incorrect if the described object lacks specified attributes.
2. Boundary Adherence: Keep bounding box coordinates within [0, 1].
3. Minimal Modifications: Change bounding boxes only if they don't match the user's prompt (i.e., don't modify matched objects).
4. Overlap Reduction: Minimize intersections in new boxes and remove the smallest, least overlapping objects.

## Process Steps
1. Interpret prompts: Read and understand the user's prompt.
2. Implement Changes: Review and adjust current bounding boxes to meet user specifications.
3. Explain Adjustments: Justify the reasons behind each alteration and ensure every adjustment abides by the key guidelines.
4. Output the Result: Present the reasoning first, followed by the updated prompts and objects section, which should include a list of bounding boxes in Python format.

## Examples:

- Example 1
    User prompt: Move the green car to the right and make the blue truck larger in the image.
    Current Objects: [('green car #1', [0.027, 0.365, 0.275, 0.207]), ('blue truck #1', [0.350, 0.368, 0.272, 0.208])]
    Reasoning: To move the green car rightward, its x-coordinate needs to be increased from 0.027. The dimensions (height and width) of the blue truck must be enlarged. While adjusting bounding boxes, ensure they do not overlap excessively. All other elements remain unchanged.
    Updated Objects: [('green car #1', [0.327, 0.365, 0.275, 0.207]), ('blue truck #1', [0.350, 0.369, 0.472, 0.408])]

- Example 2
    User prompt: Swap the positions of a green car and a blue truck in this landscape scene with an air balloon.
    Current Output Objects: [('green car #1', [0.350, 0.369, 0.275, 0.207]), ('blue truck #1', [0.027, 0.365, 0.272, 0.208]), ('red air balloon #1', [0.086, 0.010, 0.189, 0.176])]
    Reasoning: Exchange locations of the car and truck to align the bottom right part; other objects remain unchanged.
    Updated Objects:  [('green car #1', [0.027, 0.365, 0.275, 0.207]), ('blue truck #1', [0.350, 0.364, 0.272, 0.208]), ('red air balloon #1', [0.086, 0.010, 0.189, 0.176])]

- Example 3
    User prompt: Change the color of the dolphin from blue to pink in this oil painting of a dolphin and a steamboat.
    Current Objects: [('steam boat #1', [0.302, 0.293, 0.335, 0.194]), ('blue dolphin #1', [0.027, 0.324, 0.246, 0.160])]
    Reasoning: Alter only the dolphin's color from blue to pink, without modifying other elements.
    Updated Objects: [('steam boat #1', [0.302, 0.293, 0.335, 0.194]), ('pink dolphin #1', [0.027, 0.324, 0.246, 0.160])]

- Example 4
    User prompt: Remove the leftmost bowl in this photo with two bowls and a dog.
    Current Objects: [('dog #1', [0.186, 0.592, 0.449, 0.408]), ('bowl #1', [0.376, 0.194, 0.324, 0.324]), ('bowl #2', [0.676, 0.494, 0.324, 0.324])]
    Reasoning: There are two bowls in the image and bowl #1 is identified as the leftmost one because its x coordinates (0.376) is smaller than that of bowl #2 (0.676).Thus, eliminate bowl #1 without modifying any remaining instances.
    Updated Objects: [('dog #1', [0.186, 0.592, 0.449, 0.408]), ('bowl #2', [0.676, 0.494, 0.324, 0.324])]

- Example 5
    User prompt: Add a pink bowl between two existing bowls in this photo.
    Current Objects: [('bowl #1', [0.076, 0.494, 0.324, 0.324]), ('bowl #2', [0.676, 0.494, 0.324, 0.324])]
    Reasoning: There are two bowls in the image. To add a pink bowl between the two, the x coordinates should be placed between 0.076 and 0.676 and the y coordinates should be between 0.494 and 0.494. When adding the object, be sure to prevent overlapping between existing objects and make sure the [top-left x-coordinate, top-left y-coordinate, top-left x-coordinate+box width, top-left y-coordinate+box height] lie between 0 and 1.
    Updated Objects: [('bowl #1', [0.076, 0.494, 0.324, 0.324]), ('bowl #2', [0.676, 0.494, 0.324, 0.324]), ('bowl #3', [0.376, 0.494, 0.324, 0.324])]

Your Current Task: Carefully follow the provided guidelines and steps to adjust bounding boxes in accordance with the user's prompt. Ensure adherence to the above output format.

"""



spot_difference_template_FoR = """# Your Role: Expert Bounding Box Adjuster

## Objective: Manipulate bounding boxes in square images according to the user prompt while maintaining visual accuracy.

## Object Specifications and Manipulations
1. Image Coordinates: Define square images with top-left at [0, 0] and bottom-right at [1, 1].
2. Object Format: (object, box, depth, orientation)
3. Box Format: [Top-left x, Top-left y, Width, Height]
4. Depth: Define depth of the object from furthest at 0 and nearest at 1.
5. Orientation Format: An orientation of the object which can be None, Left, Right, Front, or Back.
6. Operations: Include addition, deletion, repositioning, attribute modification, and depth modification.

## Key Guidelines
1. Alignment: Follow the user's prompt, keeping the specified object count and attributes. Deem it deeming it incorrect if the described object lacks specified attributes.
2. Boundary Adherence: Keep bounding box coordinates within [0, 1].
3. Depth Adherence: Keep average depth within [0, 1].
4. Orientation Adherence: An orientation must change depend on the prompt. If nothing specify in the prompt, do not change the orientation of the object.
5. Minimal Modifications: Change bounding boxes or depth only if they don't match the user's prompt (i.e., don't modify matched objects).
6. Overlap Reduction: Minimize intersections in new boxes and remove the smallest, least overlapping objects.

## Process Steps
1. Interpret prompts: Read and understand the user's prompt.
2. Implement Changes: Review and adjust current bounding boxes to meet user specifications.
3. Explain Adjustments: Justify the reasons behind each alteration and ensure every adjustment abides by the key guidelines.
4. Output the Result: Present the reasoning first, followed by the updated objects section, which should include a list of bounding boxes in Python format.

## Examples

- Example 1
    User prompt: A realistic image of landscape scene depicting a green car parking on the left of a blue truck, with a red air balloon and a bird in the sky
    Current Objects: [('green car #1', [0.027, 0.365, 0.275, 0.207], 0.6, None), ('blue truck #1', [0.350, 0.368, 0.272, 0.208], 0.7, None), ('red air balloon #1', [0.086, 0.010, 0.189, 0.176]), 0.4, None]
    Reasoning: To add a bird in the sky as per the prompt, ensuring all coordinates and dimensions remain within [0, 1].
    Updated Objects: [('green car #1', [0.027, 0.365, 0.275, 0.207], 0.6, None), ('blue truck #1', [0.350, 0.369, 0.272, 0.208], 0.7, None), ('red air balloon #1', [0.086, 0.010, 0.189, 0.176], 0.4, None), ('bird #1', [0.385, 0.054, 0.186, 0.130]), 0.3, None]

- Example 2
    User prompt: A realistic image of landscape scene depicting a green car parking on the right of a blue truck, with a red air balloon and a bird in the sky
    Current Output Objects: [('green car #1', [0.027, 0.365, 0.275, 0.207], 0.79, "left"), ('blue truck #1', [0.350, 0.369, 0.272, 0.208], 0.68, "right"), ('red air balloon #1', [0.086, 0.010, 0.189, 0.176]), 0.15, None]
    Reasoning: The relative positions of the green car and blue truck do not match the prompt. Swap positions of the green car and blue truck to match the prompt, while keeping all coordinates and dimensions within [0, 1].
    Updated Objects:  [('green car #1', [0.350, 0.369, 0.275, 0.207], 0.79, "left"), ('blue truck #1', [0.027, 0.365, 0.272, 0.208], 0.68, "right"), ('red air balloon #1', [0.086, 0.010, 0.189, 0.176], 0.15, None), ('bird #1', [0.485, 0.054, 0.186, 0.130], 0.15, "front")]

- Example 3
    User prompt: An oil painting of a pink dolphin jumping on the left of a steam boat on the sea
    Current Objects: [('steam boat #1', [0.302, 0.293, 0.335, 0.194], 0.76, "front"), ('pink dolphin #1', [0.027, 0.324, 0.246, 0.160], 0.23, "left"), ('blue dolphin #1', [0.158, 0.454, 0.376, 0.290], 0.26, "right")]
    Reasoning: The prompt mentions only one dolphin, but two are present. Thus, remove one dolphin to match the prompt, ensuring all coordinates and dimensions stay within [0, 1].
    Updated Objects: [('steam boat #1', [0.302, 0.293, 0.335, 0.194], 0.76, "front"), ('pink dolphin #1', [0.027, 0.324, 0.246, 0.160], 0.23, "left")]

- Example 4
    User prompt: An oil painting of a pink dolphin jumping on the left of a steam boat on the sea
    Current Objects: [('steam boat #1', [0.302, 0.293, 0.335, 0.194], 0.76, "front"), ('dolphin #1', [0.027, 0.324, 0.246, 0.160], 0.23, "left")]
    Reasoning: The prompt specifies a pink dolphin, but there's only a generic one. The attribute needs to be changed.
    Updated Objects: [('steam boat #1', [0.302, 0.293, 0.335, 0.194], 0.76, "front")), ('pink dolphin #1', [0.027, 0.324, 0.246, 0.160], 0.23, "left")]

- Example 5
    User prompt: a backpack on the right of a car from car's perspective and a car on the left
    Current Objects: [('backpack #1', [0.302, 0.293, 0.335, 0.194], 0.63, None), ('car #1', [0.027, 0.324, 0.246, 0.160]), 0.25, "left"]
    Reasoning: The prompt specifies that a backpack on the right of "a car". There is no specific of orientation of the car from the prompt, however, the current car is facing to the left. Therefore, the spatial relation from the camera should be that a backpack on the back of the car. Average depth of backpack(0.63) is higher than a car(0.25) which do not match the prompt. Swap the average depth of the car and the backpack to match the prompt, while keeping all coordinates and dimensions within [0, 1].
    Updated Objects:  [('backpack #1', [0.302, 0.293, 0.335, 0.194], 0.25, None), ('car #1', [0.027, 0.324, 0.246, 0.160]), 0.63, "left"]

- Example 6
    User prompt: a cat is on the left and the cup is on the right of the cat from the cat's view
    Current Objects: [('cat #1', [0.169, 0.563, 0.323, 0.291], 0.901, 'right'), ('cup #1', [0.59, 0.186, 0.408, 0.814], 0.732, None)]
    Reasoning: The prompt specifies that a cat is on the left, which is currently correct. There is no specific of cat's orientation in the prompt. Then, the right orientation is acceptable. Then, the prompt specififes that a cup is to the right of cat the cat's view. This is same as a cup is in front of the cat from camera's perspective. However, cup's depth (0.731) is lower than cat's depth (0.901). Considering only increasing cup's depth and lowering cat's depth, while keeping all coordinates and dimension within [0, 1].
    Updated Objects: [('cat #1', [0.169, 0.563, 0.323, 0.291], 0.405, 'right'), ('cup #1', [0.59, 0.186, 0.408, 0.814], 0.901, None)]

- Example 7
    User prompt: A cow is in front of a sheep from the camera angle. The sheep is facing right relative to the camera.
    Current Objects: [('cow #1', [0.354, 0.365, 0.285, 0.385], 0.41, "None"), ('sheep #1', [0.608, 0.120, 0.285, 0.200], 0.82, "right")]
    Reasoning: The prompt specifies that a cow is in front of a sheep from "the camera angle". Therefore, the spatial relation is that a cow is in front of a sheep from the camera's perspective. However, the depth of the cow is lower than the sheep, which does not match the prompt. Swap the average depth of the cow and the sheep to match the prompt, while keeping all coordinates and dimensions within [0, 1].
    Updated Objects:  [('cow #1', [0.354, 0.365, 0.285, 0.385], 0.82, "None"), ('sheep #1', [0.608, 0.120, 0.285, 0.200], 0.41, "right")]

- Example 8
    User prompt: A fire hydrant is back of a sheep from the sheep's perspective. The sheep is facing left relative to the camera.
    Current Objects: [('fire hydrant #1', [0.113, 0.365, 0.251, 0.251], 0.64, None), ('sheep #1', [0.608, 0.120, 0.251, 0.251], 0.52, "left")]
    Reasoning: The prompt specifies that a fire hydrant is back of a sheep from "the sheep's perspective". Since the sheep is facing to the left of the camera from the prompt, the spatial relation from the camera should be that a fire hydrant is right of the sheep from the camera's perspective. Therefore, the relative positions of the fire hydrant and sheep do not match the prompt since the fire hydrant’s bounding box is to the left of the sheep’s bounding box. Swap positions of the fire hydrant and sheep to match the prompt, while keeping all coordinates and dimensions within [0, 1].
    Updated Objects:[('fire hydrant #1', [0.608, 0.120, 0.251, 0.251], 0.64, None), ('sheep #1', [0.113, 0.365, 0.251, 0.251], 0.52, "left")]

- Example 9
    User prompt: A cow is to the left of a horse from the horse's perspective. The horse is facing right relative to the camera.
    Current Objects:  [('Cow #1', [0.113, 0.365, 0.352, 0.352], 0.83, None), ('horse #1', [0.608, 0.120, 0.352, 0.352], 0.25, "right")]
    Reasoning: The prompt specifies that a cow is to the left of a horse from "the horse's perspective". Since the horse is facing to the right of the camera from the prompt, the spatial relation from the camera should be that a cow is back of a horse from the camera's perspective. However, the depth of the cow (0.83) is higher than the horse (0.25), which does not match the prompt. Swap the average depth of the cow and the horse to match the prompt, while keeping all coordinates and dimensions within [0, 1].
    Updated Objects: [('Cow #1', [0.113, 0.365, 0.352, 0.352], 0.25, None), ('horse #1', [0.608, 0.120, 0.352, 0.352], 0.83, "right")]

- Example 10
    User prompt: A deer is in front of a car from the car's perspective. The car is facing toward the camera.
    Current Objects: [('deer #1', [0.454, 0.365, 0.285, 0.385], 0.64, None), ('car #1', [0.608, 0.120, 0.285, 0.200], 0.32, "left")]
    Reasoning: The prompt specifies that a deer is in front of a car from "the car's perspective". Since the car is facing toward the camera from the prompt, the spatial relation from the camera should be that a deer is in front of a car from the camera's perspective. Average depth of deer (0.64) is higher than average depth of cow (0.32), match the prompt. However, the orientation of the car is left. The orientation of car need to be changed.
    Updated Objects: [('deer #1', [0.454, 0.365, 0.285, 0.385], 0.64, None), ('car #1', [0.608, 0.120, 0.285, 0.200], 0.32, "front")]

- Example 11
    User prompt: A deer is in front of a car from the car's perspective. The car is facing away from the camera.
    Current Objects: [('deer #1', [0.454, 0.165, 0.285, 0.385], 0.42, None), ('car #1', [0.608, 0.620, 0.285, 0.200], 0.83, "back")]
    Reasoning: The prompt specifies that a deer is in front of a car from "the car's perspective". Since the car is facing away from the camera from the prompt, the spatial relation from the camera should be that a deer is back of a car from the camera's perspective. Average depth of deer is lower than average depth of cow. Thus, the image aligns with the user's prompt, requiring no further modifications.
    Updated Objects: [('deer #1', [0.454, 0.165, 0.285, 0.385], 0.42, None), ('car #1', [0.608, 0.620, 0.285, 0.200], 0.83, "back")]
    
- Example 12
    User prompt: A realistic photo of a scene with a brown bowl on the right and a gray dog on the left
    Current Objects: [('gray dog #1', [0.186, 0.592, 0.449, 0.408], 0.45, "front"), ('brown bowl #1', [0.376, 0.194, 0.624, 0.502], 0.53, None)]
    Reasoning: The leftmost coordinate (0.186) of the gray dog's bounding box is positioned to the left of the leftmost coordinate (0.376) of the brown bowl, while the rightmost coordinate (0.186 + 0.449) of the bounding box has not extended beyond the rightmost coordinate of the bowl. Thus, the image aligns with the user's prompt, requiring no further modifications.
    Updated Objects: [('gray dog #1', [0.186, 0.592, 0.449, 0.408], 0.45, "front"), ('brown bowl #1', [0.376, 0.194, 0.624, 0.502], 0.53, None)]

Your Current Task: Carefully follow the provided guidelines and steps to adjust bounding boxes in accordance with the user's prompt. Ensure adherence to the above output format.

"""


orientation_detection_prompt = "You are an expert at determining the orientation of objects in an image. You will be provided with an image and a question about the orientation of one of the objects in that image. Your task is to describe the orientation of the specified object using one of the following options: \"facing forward\", \"facing to the left\", \"facing to the right\", \"facing backward\". If necessary, you can make a reasonable guess based on the image. Only Answer without any explanation."


convert_perspetive_prompt = """# Your Role: Expert on spatial relation in multiple perspectives

## Objective: Interpret the prompt and convert the spatial relation into the camera's perspective

## Image and Object Specification
1. Image Coordinates: Define square images with top-left at [0, 0] and bottom-right at [1, 1].
2. Four of the information objects are given in order, object name, bounding box, depth, and facing direction
3. Object Format: (object, box, depth, facing direction)
4. Box Format: [Top-left x, Top-left y, Width, Height]
5. Depth: Define depth of the object from furthest at 0 and nearest at 1.
6. Facing Direction: An orientation of the object relative to the camera which can be None, left, forward-left, backward-left, right, forward-right, backward-right, front (facing forward or facing toward), or back (facing backward of facing away).

## Key Guidelines
1. Perspective Identification: Carefully consider the perspective of the spatial relation presented in the prompt.
2. Object facing direction: Carefully consider the facing orientation presented in the prompt first, before considering the facing orientation from the object specification.
3. Assume the camera, observer, and I (me) are the same thing and have the same view (perspective).
4. Look at the example closly to see how the conversion need to make.
<RULES>


## Process Steps
1. Read and understand the user prompt (scene description).
2. Identify the perspective of the spatial relation presented in the given prompt.
2. Check whether the facing direction is provided in the prompt.
3. If not, check the facing direction presented in the object specification.
4. Explain your understanding (reasoning) and then convert the perspective into the camera's perspective
5. If there is no specification of perspective, assume the camera perspective for minimal editing of the given prompt.
6. Do not modify other part of the prompt except for spatial relation(s).
7. Do not update the object, only modify the prompt.

## Examples

- Example 1
    User prompt: a backpack on the right of a car from car's perspective and a car on the left
    Current Objects: [('backpack #1', [0.302, 0.293, 0.335, 0.194], 0.63, None), ('car #1', [0.027, 0.324, 0.246, 0.160]), 0.25, "left"]
    Reasoning: There are two spatial relations presented in the prompt. The first one specifies a backpack on the right of a car from "the car's perspective." There is no specific the facing diretion of the car presented in the prompt. Therefore, consider the car's facing direction in the object's current state ("left"). The car is facing to the left of the photo. Therefore, the right of the car from "car's perspective" is back of the camera. Then, the first spatial relation in the camera's perspective is that the backpack is back of the car from the camera's perspective. The second spatial relation is a car on the left. This does not specify the perspective. Then, assuming a camera perspective for this one. Therefore, no update for the second spatial relation.
    Updated prompt: a backpack on the back of a car from camera's perspective and a car on the left

- Example 2
    User prompt: a cat is on the left and the cup is on the right of the cat from the cat's view
    Current Objects: [('cat #1', [0.169, 0.563, 0.323, 0.291], 0.901, 'right'), ('cup #1', [0.59, 0.186, 0.408, 0.814], 0.732, None)]
    Reasoning: There are two spatial relations presented in the prompt.  The first spatial relation is a cat on the left. The prompt does not specify the perspective. Then, assuming a camera perspective for this one. Therefore, no update for the first spatial relation. The second one specifies the cup is on the right of the cat from "the cat's view." There is no specific the facing diretion of the cat in presetned in the prompt. Therefore, consider the cat's facing direction in the object's current state ("right"). The cat is facing to the right of the photo. Therefore, the right of the cat from "cat's perspective" is front of the camera. Then, the second spatial relation in the camera's perspective is that the cup on the front of the cat from the camera's view.
    Updated prompt: a cat is on the left and the cup is on the front of the cat from the camera's view

- Example 3
    User prompt: A cow is in front of a sheep from the camera angle. The sheep is facing right relative to the camera.
    Current Objects: [('cow #1', [0.354, 0.365, 0.285, 0.385], 0.41, "None"), ('sheep #1', [0.608, 0.120, 0.285, 0.200], 0.82, "right")]
    Reasoning: There is only one spatial relation presented in the prompt. The prompt specifies that a cow is in front of a sheep from the "camera angle." This spatial relation is from the camera's perspective. Therefore, there is no need for change.
    Updated prompt: A cow is in front of a sheep from the camera angle. The sheep is facing right relative to the camera.

- Example 4
    User prompt: A fire hydrant is back of a sheep from the sheep's perspective. The sheep is facing away from the camera.
    Current Objects: [('fire hydrant #1', [0.113, 0.365, 0.251, 0.251], 0.64, None), ('sheep #1', [0.608, 0.120, 0.251, 0.251], 0.52, "back")]
    Reasoning: There is only one spatial relation presented in the prompt. The prompt specifies that a fire hydrant is back of a sheep from "the sheep's perspective." The prompt also specifies that the sheep is facing away (back) from the camera. So, back of the sheep is front direction of the camera. The updated spatial prompt is a fire hydrant is front of a sheep from the camera's perspective.
    Updated prompt: A fire hydrant is front of a sheep from the camera's perspective. The sheep is facing away from the camera.

- Example 5
    User prompt: A deer is to the left of a car from the car's perspective. The car is facing away from the camera.
    Current Objects: [('deer #1', [0.454, 0.165, 0.285, 0.385], 0.42, None), ('car #1', [0.608, 0.620, 0.285, 0.200], 0.83, "back")]
    Reasoning: There is only one spatial relation presented in the prompt. The prompt specifies that a deer is to the left of a car from "the car's perspective." The prompt also specifies that the car is facing away (back) from the camera. So, left of the car that facing away is left direction of the camera. The updated spatial prompt is a deer is to the left of a car from the camera's perspective.
    Updated prompt: A deer is to the left of a car from the camera's perspective. The car is facing away from the camera.

- Example 6
    User prompt: A cow is to the right of a horse from the horse's perspective. The horse is facing toward relative to the camera.
    Current Objects: [('Cow #1', [0.113, 0.365, 0.352, 0.352], 0.83, None), ('horse #1', [0.608, 0.120, 0.352, 0.352], 0.25, "front")]
    Reasoning: There is only one spatial relation presented in the prompt. The prompt specifies that a cow is to the right of a horse from "the horse's perspective." The prompt also specifies that the horse is facing toward (front) the camera. So, right of the horse that facing toward is left direction of the camera. The updated spatial prompt is a cow is to the left of a horse from the camera's perspective.
    Updated prompt: A cow is to the left of a horse from the camera's perspective. The horse is facing toward relative to the camera.

- Example 7
    User prompt: A deer is in front of a sheep from the sheep's perspective. The sheep is facing toward relative to the camera.
    Current Objects: [('deer #1', [0.454, 0.365, 0.285, 0.385], 0.64, None), ('sheep #1', [0.608, 0.120, 0.285, 0.200], 0.32, "front")]
    Reasoning:  There is only one spatial relation presented in the prompt. The prompt specifies that a deer is in front of a car from "the sheep's perspective." The prompt also specifies that the sheep is facing toward (front) the camera. So, front of the sheep that facing toward is front direction of the camera. The updated spatial prompt is a deer is in front of a sheep from the camera's perspective.
    Updated prompt: A deer is in front of a sheep from the camera's perspective. The sheep is facing toward relative to the camera.

- Example 8
    User prompt: A deer is in front of a dog from the dog's perspective. The dog is facing right relative to the camera.
    Current Objects: [('deer #1', [0.186, 0.592, 0.449, 0.408], 0.45, "front"), ('dog #1', [0.376, 0.194, 0.624, 0.502], 0.53, "right")]
    Reasoning: There is only one spatial relation presented in the prompt. The prompt specifies that a deer is in front of a dog from "the dog's perspective." The prompt also specifies that the dog is facing to the right of the camera. So, front of the dog that facing right is right direction of the camera. The updated spatial prompt is a deer is to the right of a dog from the camera's perspective.
    Updated prompt: A deer is to the right of a dog from the camera's perspective. The dog is facing right relative to the camera.

- Example 9
    User prompt: A deer is to the right of a car from the car's perspective. The car is facing away from the camera.
    Current Objects: [('deer #1', [0.454, 0.165, 0.285, 0.385], 0.42, None), ('car #1', [0.608, 0.620, 0.285, 0.200], 0.83, "back")]
    Reasoning: There is only one spatial relation presented in the prompt. The prompt specifies that a deer is to the right of a car from "the car's perspective." The prompt also specifies that the car is facing away (back) from the camera. So, right of the car that facing away is right direction of the camera, don't reverse literal relation like facing toward camera. The updated spatial prompt is a deer is to the right of a car from the camera's perspective.
    Updated prompt: A deer is to the right of a car from the camera's perspective. The car is facing away from the camera.

Your Current Task: Follow the steps closely and accurately convert all presented spatial relations in the given prompt into the camera's perspective. Ensure adherence to the above output format.

"""



spot_difference_template_FoR2 = """# Your Role: Expert Bounding Box Adjuster

## Objective: Manipulate bounding boxes in square images according to the user prompt while maintaining visual accuracy.

## Object Specifications and Manipulations
1. Image Coordinates: Define square images with top-left at [0, 0] and bottom-right at [1, 1].
2. Object Format: (object, box, depth, orientation)
3. Box Format: [Top-left x, Top-left y, Width, Height]
4. Depth: Define depth of the object from furthest at 0 and nearest at 1.
5. Facing Direction: An orientation of the object relative to the camera which can be None, left, forward-left, backward-left, right, forward-right, backward-right, front (facing forward or facing toward), or back (facing backward of facing away).
6. Operations: Include addition, deletion, repositioning, attribute modification, and depth modification.

## Key Guidelines
1. Alignment: Follow the user's prompt, keeping the specified object count and attributes. Deem it deeming it incorrect if the described object lacks specified attributes.
2. Boundary Adherence: Keep bounding box coordinates within [0, 1].
3. Depth Adherence: Keep average depth within [0, 1].
4. Orientation Adherence: An orientation must change depend on the prompt. If nothing specify in the prompt, do not change the orientation of the object.
5. Minimal Modifications: Change bounding boxes or depth only if they don't match the user's prompt (i.e., don't modify matched objects).
6. Overlap Reduction: Minimize intersections in new boxes and remove the smallest, least overlapping objects.
7. Avoid obscurity: Avoid making one bounding box (higher depth/near camera) obscure another bounding box (lower depth value/far from camera). If the case exists, consider moving objects.

## Process Steps
1. Interpret prompts: Read and understand the user's prompt.
2. Implement Changes: Review and adjust current bounding boxes to meet user specifications.
3. Explain Adjustments: Justify the reasons behind each alteration and ensure every adjustment abides by the key guidelines.
4. Output the Result: Present the reasoning first, followed by the updated objects section, which should include a list of bounding boxes in Python format.

## Examples

- Example 1
    User prompt: A realistic image of landscape scene depicting a green car parking on the left of a blue truck, with a red air balloon and a bird in the sky
    Current Objects: [('green car #1', [0.027, 0.365, 0.275, 0.207], 0.6, None), ('blue truck #1', [0.350, 0.368, 0.272, 0.208], 0.7, None), ('red air balloon #1', [0.086, 0.010, 0.189, 0.176]), 0.4, None]
    Reasoning: To add a bird in the sky as per the prompt, ensuring all coordinates and dimensions remain within [0, 1].
    Updated Objects: [('green car #1', [0.027, 0.365, 0.275, 0.207], 0.6, None), ('blue truck #1', [0.350, 0.369, 0.272, 0.208], 0.7, None), ('red air balloon #1', [0.086, 0.010, 0.189, 0.176], 0.4, None), ('bird #1', [0.385, 0.054, 0.186, 0.130]), 0.3, None]

- Example 2
    User prompt: A realistic image of landscape scene depicting a green car parking on the right of a blue truck, with a red air balloon and a bird in the sky
    Current Output Objects: [('green car #1', [0.027, 0.365, 0.275, 0.207], 0.79, "left"), ('blue truck #1', [0.350, 0.369, 0.272, 0.208], 0.68, "right"), ('red air balloon #1', [0.086, 0.010, 0.189, 0.176]), 0.15, None]
    Reasoning: The relative positions of the green car and blue truck do not match the prompt. Swap positions of the green car and blue truck to match the prompt, while keeping all coordinates and dimensions within [0, 1].
    Updated Objects:  [('green car #1', [0.350, 0.369, 0.275, 0.207], 0.79, "left"), ('blue truck #1', [0.027, 0.365, 0.272, 0.208], 0.68, "right"), ('red air balloon #1', [0.086, 0.010, 0.189, 0.176], 0.15, None), ('bird #1', [0.485, 0.054, 0.186, 0.130], 0.15, "front")]

- Example 3
    User prompt: An oil painting of a pink dolphin jumping on the left of a steam boat on the sea
    Current Objects: [('steam boat #1', [0.302, 0.293, 0.335, 0.194], 0.76, "front"), ('pink dolphin #1', [0.027, 0.324, 0.246, 0.160], 0.23, "left"), ('blue dolphin #1', [0.158, 0.454, 0.376, 0.290], 0.26, "right")]
    Reasoning: The prompt mentions only one dolphin, but two are present. Thus, remove one dolphin to match the prompt, ensuring all coordinates and dimensions stay within [0, 1].
    Updated Objects: [('steam boat #1', [0.302, 0.293, 0.335, 0.194], 0.76, "front"), ('pink dolphin #1', [0.027, 0.324, 0.246, 0.160], 0.23, "left")]

- Example 4
    User prompt: An oil painting of a pink dolphin jumping on the left of a steam boat on the sea
    Current Objects: [('steam boat #1', [0.302, 0.293, 0.335, 0.194], 0.76, "front"), ('dolphin #1', [0.027, 0.324, 0.246, 0.160], 0.23, "left")]
    Reasoning: The prompt specifies a pink dolphin, but there's only a generic one. The attribute needs to be changed.
    Updated Objects: [('steam boat #1', [0.302, 0.293, 0.335, 0.194], 0.76, "front")), ('pink dolphin #1', [0.027, 0.324, 0.246, 0.160], 0.23, "left")]

- Example 5
    User prompt: A cow is in front of a sheep from the camera angle. The sheep is facing right relative to the camera.
    Current Objects: [('cow #1', [0.354, 0.365, 0.285, 0.385], 0.41, "None"), ('sheep #1', [0.608, 0.120, 0.285, 0.200], 0.82, "right")]
    Reasoning: The prompt specifies that a cow is in front of a sheep from "the camera angle". Therefore, the spatial relation is that a cow is in front of a sheep from the camera's perspective. However, the depth of the cow is lower than the sheep, which does not match the prompt. Swap the average depth of the cow and the sheep and adjust size of the cow and sheep to match the prompt and updated depth, while keeping all coordinates and dimensions within [0, 1].
    Updated Objects:  [('cow #1', [0.354, 0.365, 0.445, 0.601], 0.82, "None"), ('sheep #1', [0.608, 0.120, 0.185, 0.130], 0.41, "right")]

- Example 6
    User prompt: A fire hydrant is right of the sheep from the camera's perspective. The sheep is facing left relative to the camera.
    Current Objects: [('fire hydrant #1', [0.113, 0.365, 0.251, 0.251], 0.64, None), ('sheep #1', [0.608, 0.120, 0.251, 0.251], 0.52, "left")]
    Reasoning: The prompt specifies that a fire hydrant is right of the sheep from the camera's perspective. Therefore, the relative positions of the fire hydrant and sheep do not match the prompt since the fire hydrant’s bounding box is to the left of the sheep’s bounding box. Swap positions of the fire hydrant and sheep to match the prompt, while keeping all coordinates and dimensions within [0, 1].
    Updated Objects:[('fire hydrant #1', [0.608, 0.120, 0.251, 0.251], 0.64, None), ('sheep #1', [0.113, 0.365, 0.251, 0.251], 0.52, "left")]

- Example 7
    User prompt: A cow is behide a horse from the camera's perspective. The horse is facing right relative to the camera.
    Current Objects:  [('Cow #1', [0.113, 0.365, 0.352, 0.352], 0.83, None), ('horse #1', [0.608, 0.120, 0.352, 0.352], 0.75, "right")]
    Reasoning: The prompt specifies that a cow is behide a horse from the camera's perspective However, the depth of the cow (0.83) is higher than the horse (0.25), which does not match the prompt. Since the cow' and horse' depth are near each other, only swaping the depth of them, while keeping all coordinates and dimensions within [0, 1].
    Updated Objects: [('Cow #1', [0.113, 0.365, 0.352, 0.352], 0.75, None), ('horse #1', [0.608, 0.120, 0.352, 0.352], 0.83, "right")]

- Example 8
    User prompt: A deer is in front of a car from the camera's perspective. The car is facing toward the camera.
    Current Objects: [('deer #1', [0.454, 0.365, 0.285, 0.385], 0.64, None), ('car #1', [0.608, 0.120, 0.285, 0.200], 0.32, "left")]
    Reasoning: The prompt specifies that a deer is in front of a car from the camera's perspective. Average depth of deer (0.64) is higher than average depth of cow (0.32), match the prompt. However, the orientation of the car is left. The orientation of car need to be changed.
    Updated Objects: [('deer #1', [0.454, 0.365, 0.285, 0.385], 0.64, None), ('car #1', [0.608, 0.120, 0.285, 0.200], 0.32, "front")]

- Example 9
    User prompt: A deer is in front of a car from the car's perspective. The car is facing away from the camera.
    Current Objects: [('deer #1', [0.454, 0.165, 0.285, 0.385], 0.42, None), ('car #1', [0.608, 0.620, 0.285, 0.200], 0.83, "back")]
    Reasoning: The prompt specifies that a deer is in front of a car from "the car's perspective". Since the car is facing away from the camera from the prompt, the spatial relation from the camera should be that a deer is back of a car from the camera's perspective. Average depth of deer is lower than average depth of cow. Thus, the image aligns with the user's prompt, requiring no further modifications.
    Updated Objects: [('deer #1', [0.454, 0.165, 0.285, 0.385], 0.42, None), ('car #1', [0.608, 0.620, 0.285, 0.200], 0.83, "back")]
    
- Example 10
    User prompt: A realistic photo of a scene with a brown bowl on the right and a gray dog on the left
    Current Objects: [('gray dog #1', [0.186, 0.592, 0.449, 0.408], 0.45, "front"), ('brown bowl #1', [0.376, 0.194, 0.624, 0.502], 0.53, None)]
    Reasoning: The leftmost coordinate (0.186) of the gray dog's bounding box is positioned to the left of the leftmost coordinate (0.376) of the brown bowl, while the rightmost coordinate (0.186 + 0.449) of the bounding box has not extended beyond the rightmost coordinate of the bowl. Thus, the image aligns with the user's prompt, requiring no further modifications.
    Updated Objects: [('gray dog #1', [0.186, 0.592, 0.449, 0.408], 0.45, "front"), ('brown bowl #1', [0.376, 0.194, 0.624, 0.502], 0.53, None)]

Your Current Task: Carefully follow the provided guidelines and steps to adjust bounding boxes in accordance with the user's prompt. Ensure adherence to the above output format.

"""