purpose: "In Context Learning for DoorKey"
domain: MiniGrid-DoorKey-5x5-v0
date: "2025Mar13"
icl_prompt_goal_advising: ""
icl_prompt_action_advising: "
\n Below are examples about reasoning over the best action. \n
Question: You are in the following state represented in ascii art:
#####
#.D.#
#^#.#
#.#G#
#####
The door is locked. You  have the key. 
Which action do you choose? Please think step by step.
Only give the answer in a new line in JSON format:
{\"reasoning\": <REASONING>, \"action\": <ACTION>}
Where <ACTION> is one of \"TURN LEFT\", \"TURN RIGHT\", \"MOVE FORWARD\", \"PICK UP THE KEY\" or \"UNLOCK THE DOOR\", <REASONING> is a string of your thinking steps.
\n
Answer:
{\"reasoning\": \"The agent is one tile below and to the left of the door, facing up. We first need to move forward to get adjacent to the door tile. Then we can turn to face the door and unlock it. Therefore, the best action now is to move forward.\", \"action\": \"MOVE FORWARD\"}
\n
Question: You are in the following state represented in ascii art:
#####
#.#.#
#.D>#
#.#G#
#####
The door is unlocked. You  have the key. 
Which action do you choose? Please think step by step.
Only give the answer in a new line in JSON format:
{\"reasoning\": <REASONING>, \"action\": <ACTION>}
Where <ACTION> is one of \"TURN LEFT\", \"TURN RIGHT\", \"MOVE FORWARD\", \"PICK UP THE KEY\" or \"UNLOCK THE DOOR\", <REASONING> is a string of your thinking steps.
\n
Answer: {\"reasoning\": \"The agent is facing right into a wall, so moving forward is not possible. To reach the goal, the agent must turn around and move through the empty tiles. The first step is to turn right (then toward the goal).\", \"action\": \"TURN RIGHT\"}
\n
You are in the following state represented in ascii art:
#####
#.#.#
#<D.#
#K#G#
#####
The door is locked. You do not have the key. 
Which action do you choose? Please think step by step.
Only give the answer in a new line in JSON format:
{\"reasoning\": <REASONING>, \"action\": <ACTION>}
Where <ACTION> is one of \"TURN LEFT\", \"TURN RIGHT\", \"MOVE FORWARD\", \"PICK UP THE KEY\" or \"UNLOCK THE DOOR\", <REASONING> is a string of your thinking steps.
\n
Answer:{\"reasoning\": \"We are currently facing left and cannot move forward because there is a wall to our left. The key is located below us, so the first step is to turn left to face down, allowing us to move toward the key.\", \"action\": \"TURN LEFT\"}
"
icl_prompt_delta_action: ""
icl_prompt_preference: "
\n Below are examples about reasoning over the actions given a state. \n
Question: You are in the following state represented in ascii art:
#####
#K#.#
#.D.#
#^#G#
#####
The door is locked. You do not have the key. 
Given MOVE FORWARD or TURN RIGHT, which action is better? Please think step by step.
Only give the answer in a new line in JSON format:
{\"reasoning\": <REASONING>, \"preference\": <PREFERENCE>}
Where <PREFERENCE> is one of \"FIRST\" or \"SECOND\", <REASONING> is a string of your thinking steps.
\n
Answer: {\"reasoning\": \"We want to get the key, which is above us, so we should move forward and head to it. Turning right points us toward a wall, which doesn’t help.\", \"preference\": \"FIRST\"}
\n
Question: You are in the following state represented in ascii art:
#####
#.#.#
#.>.#
#.#G#
#####
The door is unlocked. You  have the key. 
Given TURN LEFT or MOVE FORWARD, which action is better? Please think step by step.
Only give the answer in a new line in JSON format:
{\"reasoning\": <REASONING>, \"preference\": <PREFERENCE>}
Where <PREFERENCE> is one of \"FIRST\" or \"SECOND\", <REASONING> is a string of your thinking steps.
\n
Answer: {\"reasoning\": \"We want to reach the goal to the right. Turning left would face us upward where there’s a wall, which doesn’t help. Moving forward brings us closer to the goal.\", \"preference\": \"SECOND\"}
\n
Question: You are in the following state represented in ascii art:
#####
#K#.#
#VD.#
#.#G#
#####
The door is locked. You do not have the key. 
Given TURN RIGHT or PICK UP THE KEY, which action is better? Please think step by step.
Only give the answer in a new line in JSON format:
{\"reasoning\": <REASONING>, \"preference\": <PREFERENCE>}
Where <PREFERENCE> is one of \"FIRST\" or \"SECOND\", <REASONING> is a string of your thinking steps.
\n
Answer: 
{\"reasoning\": \"We cannot pick up the key because we are not facing it. Turning right helps us rotate toward the key so we can eventually pick it up.\", \"preference\": \"FIRST\"}
"
icl_prompt_binary_feedback: "
Below are examples about how to judge an action. \n
Question: You are in the following state represented in ascii art:
#####
#>#.#
#KD.#
#.#G#
#####
The door is locked. You do not have the key. 
Is action MOVE FORWARD the best action you can take? Please think step by step.
Only give the answer in a new line in JSON format:
{\"reasoning\": <REASONING>, \"feedback\": <FEEDBACK>}
Where <FEEDBACK> is one of \"YES\" or \"NO\", <REASONING> is a string of your thinking steps.
\n
Answer: {\"reasoning\": \"The agent is facing a wall, so moving forward does nothing and fails to progress toward picking up the key. A turn to face the key is a better action.\", \"feedback\": \"NO\"}
\n
Question: You are in the following state represented in ascii art:
#####
#.D.#
#^#.#
#.#G#
#####
The door is locked. You  have the key. 
Is action MOVE FORWARD the best action you can take? Please think step by step.
Only give the answer in a new line in JSON format:
{\"reasoning\": <REASONING>, \"feedback\": <FEEDBACK>}
Where <FEEDBACK> is one of \"YES\" or \"NO\", <REASONING> is a string of your thinking steps.
\n
Answer: {\"reasoning\": \"You already have the key, so the immediate goal is to move adjacent to the locked door and unlock it. The tile directly ahead is free, and moving forward places you in position to turn and unlock the door. Therefore, moving forward is the best action.\", \"feedback\": \"YES\"}
\n
Question: You are in the following state represented in ascii art:
#####
#.#.#
#.>.#
#.#G#
#####
The door is unlocked. You  have the key. 
Is action TURN LEFT the best action you can take? Please think step by step.
Only give the answer in a new line in JSON format:
{\"reasoning\": <REASONING>, \"feedback\": <FEEDBACK>}
Where <FEEDBACK> is one of \"YES\" or\"NO\", <REASONING> is a string of your thinking steps.
\n
Answer:
{\"reasoning\": \"The agent is facing right and needs to move toward the goal on its right and then turn down. Turning left would make the agent face upwards into a wall. Moving forward is the better move.\", \"feedback\": \"NO\"}
"

