purpose: "Image Observation Only"
domain: robomimic
date: "2025 July 31"
base_prompt: "You are a robot engineer. You need to control a robot arm to lift an object. You need to observe the robot's current state and choose which action to do next in order to lift the object. You can move the robot in the X, Y, or Z direction, as well as open/close its gripper. You will also recieve a image as your observation. You can choose actions that move the gripper in the X, Y, Z direction, with additional control of the gripper. The format of the action is:\n    [ resulting x, resulting y, resulting z, ignore, ignore, ignore, resulting gripper state]\nIn the image, you will see you are facing the robot and a red cube (the object) is on the table. When dx < 0, the gripper moves away from you. When dx > 0, the gripper moves towards you. When dy < 0, the gripper moves towards the left of the picture. When dy > 0, the gripper moves towards the right of the picture. When dz < 0, the gripper moves towards the table. When dz > 0, the gripper moves away from the table.\n Note on the gripper: Set grip to 1 to open the gripper and -1 to close the gripper. The robot's gripper should be closed if it is beginning to grasp the object, or when it is holding the object. When it is approaching the object, the gripper is open. If the robot gripper needs to be closed, you should continue to close the gripper, even if it is closed. Similarly, if the robot gripper needs to be open, you should continue to open the gripper, even if it is already open.
"


if_optimal_prompt_cot: "
Your image observation is:
<PATH>IMAGEOBSERVATION</PATH>
Is action ACTION the best action you can take? Please think step by step.
You should consider the position of the robot gripper and object, and how they are related to each other. For example, if the robot gripper is on the left of the object, you should consider moving the robot gripper to the right.
Only give the answer in a new line in JSON format:
{\"reasoning\": <REASONING>, \"feedback\": <FEEDBACK>}
Where <FEEDBACK> is one of \"YES\" or \"NO\", <REASONING> is a string of your thinking steps.
"

action_advising_base_prompt_cot: "
Your image observation is:
<PATH>IMAGEOBSERVATION</PATH>
Which action do you choose? Please think step by step.
You should consider the position of the robot gripper and object, and how they are related to each other. For example, if the robot gripper is on the left of the object, you should consider moving the robot gripper to the right.
Only give the answer in a new line in JSON format:
{\"reasoning\": <REASONING>, \"action\": <ACTION>}
Where <ACTION> is the optimal action following the format of (dx, dy, dz), <REASONING> is a string of your thinking steps.
"

preference_base_prompt_cot: "
Your image observation is:
<PATH>IMAGEOBSERVATION</PATH>
Given ACTION1 or ACTION2, which action is better? Please think step by step.
You should consider the position of the robot gripper and object, and how they are related to each other. For example, if the robot gripper is on the left of the object, you should consider moving the robot gripper to the right.
Only give the answer in a new line in JSON format:
{\"reasoning\": <REASONING>, \"preference\": <PREFERENCE>}
Where <PREFERENCE> is one of \"FIRST\" or \"SECOND\", <REASONING> is a string of your thinking steps.
"


delta_action_prompt_cot: "
Your image observation is:
<PATH>IMAGEOBSERVATION</PATH>
Given the following actions, which action is better? Please think step by step.
DELTA
You should consider the position of the robot gripper and object, and how they are related to each other. For example, if the robot gripper is on the left of the object, you should consider moving the robot gripper to the right.
Only give the answer in a new line in JSON format:
{\"reasoning\": <REASONING>, \"index\": <INDEX>}
Where <INDEX> is one of action choice index, <REASONING> is a string of your thinking steps.
"