purpose: "In Context Learning for Four Rooms"
domain: MiniGrid-FourRooms-v0
date: "2025Mar22"
icl_prompt_goal_advising: ""
icl_prompt_action_advising: "
\n Below are examples about reasoning over the best action. \n
Question: \n
You are in the following state represented in ascii art:\n###################\n#........#........#\n#........#........#\n#........#........#\n#........#........#\n#........#........#\n#........#........#\n#........#........#\n#.................#\n######.##########.#\n#........#....G.<.#\n#........#........#\n#........#........#\n#.................#\n#........#........#\n#........#........#\n#........#........#\n#........#........#\n###################\nWhich action do you choose? Please think step by step.\nOnly give the answer in a new line in JSON format:\n{\"reasoning\": <REASONING>, \"action\": <ACTION>}\nWhere <ACTION> is one of \"TURN LEFT\", \"TURN RIGHT\" or \"MOVE FORWARD\", <REASONING> is a string of your thinking steps.
\n
Answer: {\"reasoning\": \"The agent is already facing left and the goal is two tiles to its left, so moving forward brings it closer to the goal.\", \"action\": \"MOVE FORWARD\"}
\n
Question: \n
You are in the following state represented in ascii art:\n###################\n#........#........#\n#........#........#\n#........#........#\n#........#........#\n#...............^.#\n#........#........#\n#.......G#........#\n#........#........#\n##.#############.##\n#.................#\n#........#........#\n#........#........#\n#........#........#\n#........#........#\n#........#........#\n#........#........#\n#........#........#\n###################\nWhich action do you choose? Please think step by step.\nOnly give the answer in a new line in JSON format:\n{\"reasoning\": <REASONING>, \"action\": <ACTION>}\nWhere <ACTION> is one of \"TURN LEFT\", \"TURN RIGHT\" or \"MOVE FORWARD\", <REASONING> is a string of your thinking steps.\n
Answer: \n
{\"reasoning\": \"I need to reach the goal at row=7,col=8. I am currently at row=5,col=16 facing up, and the goal is to my left. To start moving left, I first need to turn left.\", \"action\": \"TURN LEFT\"} 
\n
Question: \n
You are in the following state represented in ascii art:\n###################\n#........#........#\n#.................#\n#........#........#\n#........#........#\n#........#........#\n#.....G..#........#\n#........#........#\n#........#........#\n#####.####.########\n#........#........#\n#........#........#\n#........#........#\n#........#........#\n#........#........#\n#........#........#\n#........#........#\n#......V..........#\n###################\nWhich action do you choose? Please think step by step.\nOnly give the answer in a new line in JSON format:\n{\"reasoning\": <REASONING>, \"action\": <ACTION>}\nWhere <ACTION> is one of \"TURN LEFT\", \"TURN RIGHT\" or \"MOVE FORWARD\", <REASONING> is a string of your thinking steps.
\n
Answer: \n
{\"reasoning\": \"The agent is facing down at row=17,col=7, but the goal is up around row=6,col=6. Moving forward would hit the wall at row=18. To head toward the goal, we must first turn left or right. Turning right from facing down will make the agent face left, allowing movement toward the open passage at col=5 on row=9. Hence, the best next action is to turn right.\", \"action\": \"TURN RIGHT\"}
"
icl_prompt_delta_action: ""
icl_prompt_preference: "
\n Below are examples about reasoning over the actions given a state. \n
Question: \n
You are in the following state represented in ascii art:\n###################\n#........#........#\n#........#........#\n#........#........#\n#........#........#\n#........#........#\n#........#........#\n#........#........#\n#.>...............#\n######.##########.#\n#........#....G...#\n#........#........#\n#........#........#\n#.................#\n#........#........#\n#........#........#\n#........#........#\n#........#........#\n###################\nGiven TURN LEFT or MOVE FORWARD, which action is better? Please think step by step.\nOnly give the answer in a new line in JSON format:\n{\"reasoning\": <REASONING>, \"preference\": <PREFERENCE>}\nWhere <PREFERENCE> is one of \"FIRST\" or \"SECOND\", <REASONING> is a string of your thinking steps.
\n
Answer: \n
{\"reasoning\": \"The agent needs to move right toward the open passage leading to the goal. Turning left would not help progress at this point, so moving forward is better.\", \"preference\": \"SECOND\"}
\n
Question: \n
You are in the following state represented in ascii art:\n###################\n#........#........#\n#.G......#........#\n#........#........#\n#........#........#\n#........#........#\n#.................#\n#........#........#\n#........#........#\n##.#########.######\n#........#........#\n#........#........#\n#.>......#........#\n#........#........#\n#.................#\n#........#........#\n#........#........#\n#........#........#\n###################\nGiven TURN LEFT or MOVE FORWARD, which action is better? Please think step by step.\nOnly give the answer in a new line in JSON format:\n{\"reasoning\": <REASONING>, \"preference\": <PREFERENCE>}\nWhere <PREFERENCE> is one of \"FIRST\" or \"SECOND\", <REASONING> is a string of your thinking steps.
\n
Answer: \n
{\"reasoning\": \"My goal is to reach the G above. I am currently facing right, but I need to go upward. Therefore, I should turn left first to face up before moving forward toward the goal.\", \"preference\": \"FIRST\"}
\n
Question: \n
You are in the following state represented in ascii art:\n###################\n#........#........#\n#........#........#\n#........#........#\n#........#........#\n#........#........#\n#........#........#\n#G.......<........#\n#........#........#\n#####.########.####\n#........#........#\n#........#........#\n#........#........#\n#.................#\n#........#........#\n#........#........#\n#........#........#\n#........#........#\n###################\nGiven TURN RIGHT or MOVE FORWARD, which action is better? Please think step by step.\nOnly give the answer in a new line in JSON format:\n{\"reasoning\": <REASONING>, \"preference\": <PREFERENCE>}\nWhere <PREFERENCE> is one of \"FIRST\" or \"SECOND\", <REASONING> is a string of your thinking steps.
\n
Answer: \n
{\"reasoning\": \"The agent is already facing left, and the goal is to the left. Moving forward will bring the agent closer to the goal, whereas turning right would not help immediate progress.\", \"preference\": \"SECOND\"}
"
icl_prompt_binary_feedback: "
Below are examples about how to judge an action. \n
Question: \n
You are in the following state represented in ascii art:\n###################\n#........#........#\n#........>........#\n#........#........#\n#........#........#\n#........#........#\n#........#........#\n#........#........#\n#........#........#\n###.##########.####\n#........#........#\n#........#........#\n#........#........#\n#........#........#\n#........#.G......#\n#.................#\n#........#........#\n#........#........#\n###################\nIs action TURN RIGHT the best action you can take? Please think step by step.\nOnly give the answer in a new line in JSON format:\n{\"reasoning\": <REASONING>, \"feedback\": <FEEDBACK>}\nWhere <FEEDBACK> is one of \"YES\" or \"NO\", <REASONING> is a string of your thinking steps.
\n
Answer: \n
{\"reasoning\": \"The agent is facing right and can move forward without immediately hitting a wall. Turning right would make the agent face downward, where a wall blocks forward movement. It’s more efficient to move forward along the current corridor. Therefore, turning right is not optimal now.\", \"feedback\": \"NO\"}
\n
Question: \n
You are in the following state represented in ascii art:\n###################\n#........#........#\n#........#........#\n#........#........#\n#......V.#........#\n#........#........#\n#........#........#\n#.................#\n#........#........#\n#######.##.########\n#........#........#\n#........#........#\n#........#........#\n#........#...G....#\n#........#........#\n#.................#\n#........#........#\n#........#........#\n###################\nIs action MOVE FORWARD the best action you can take? Please think step by step.\nOnly give the answer in a new line in JSON format:\n{\"reasoning\": <REASONING>, \"feedback\": <FEEDBACK>}\nWhere <FEEDBACK> is one of \"YES\" or \"NO\", <REASONING> is a string of your thinking steps.
\n
Answer: \n
{\"reasoning\": \"The agent’s goal is below. Since there is open space forward and turning wouldn’t immediately help, moving forward advances toward the goal. Therefore, moving forward is optimal now.\", \"feedback\": \"YES\"}
\n
Question: \n
You are in the following state represented in ascii art:\n###################\n#........#........#\n#........#........#\n#........#........#\n#........#........#\n#........#........#\n#........#........#\n#G...........<....#\n#........#........#\n#####.########.####\n#........#........#\n#........#........#\n#........#........#\n#.................#\n#........#........#\n#........#........#\n#........#........#\n#........#........#\n###################\nIs action TURN LEFT the best action you can take? Please think step by step.\nOnly give the answer in a new line in JSON format:\n{\"reasoning\": <REASONING>, \"feedback\": <FEEDBACK>}\nWhere <FEEDBACK> is one of \"YES\" or \"NO\", <REASONING> is a string of your thinking steps.
\n
Answer: {\"reasoning\": \"We are already facing left, which is the direction of the goal. Turning left would make us face down, moving us away from the goal’s direction. Moving forward is better.\", \"feedback\": \"NO\"}
\n
"

