purpose: "In Context Learning, assuming dynamics is known"
domain: CliffWalking
date: "2025Feb25"
icl_prompt_goal_advising: "
\n Below are examples about reasoning over the best next state. \n
Question: You are in position (3, 0). What coordinates should you move to in the next step? \n
 Answer: {\"reasoning\": \"There are 4 neighbouring coordinates: (2, 0), (4, 0), (3, -1), (3, 1). If I move to (2, 0), it is a safe place. If I move to (4, 0), I will hit the wall as I am in a zero-indexed 4x12 grid and (4, 0) is outside of boundary. If I move to (3, -1), I will similarly hit the wall as (3, -1) is outside of boundary. If I move to (3, 1) and since there are holes in (3, 1) ... (3, 10), I will fall into a hole. Therefore moving to (2, 0) is the only safe option.\",\"row\": \"2\", \"column\": \"0\"} \n
 Question: You are in position (2, 11). What coordinates should you move to in the next step? \n
Answer: {\"reasoning\": \"There are 4 neighbouring coordinates: (1, 11), (3, 11), (2, 10) and (2, 12). If I move to (1, 11) and then I will need 2 steps as (1, 11) -> (2, 11) -> (3, 11) to reach the goal (3, 11). If I move to (3, 11) I will reach the goal. If I move to (2, 10) and then I need 3 steps as (2, 10) -> (3, 10) -> (3, 11) to reach the goal. If I move to (2, 12) I will hit the wall as it is outside of boundary. Therefore (3, 11) is the best option.\",\"row\": \"3\", \"column\": \"11\"} \n
Question: You are in position (1, 5). What coordinates should you move to in the next step? \n
Answer: {\"reasoning\": \"There are 4 possible action: (0, 5), (2, 5), (1, 6), (1, 4). If I move to (0, 5) and I will need 9 steps to reach goal: (0, 5) -> (0, 6) -> (0, 7) -> (0, 8) > (0, 9) -> (0, 10) -> (0, 11) -> (1, 11) -> (2, 11) -> (3, 11). If I move to (2, 5) and I will need 7 steps to reach goal: (2, 5) -> (2, 6) -> (2, 7) -> (2, 8) -> (2, 9) -> (2, 10) -> (2, 11) -> (3, 11). If I move to (1, 6) and I will need 7 steps to reach goal: (1, 6) -> (1, 7) -> (1, 8) -> (1, 9) -> (1, 10) -> (1, 11) -> (2, 11) -> (3, 11). If I move to (1, 4) and I will need 9 steps to reach goal: (1, 4) -> (1, 5) -> (1, 6) -> (1, 7) -> (1, 8) -> (1, 9) -> (1, 10) -> (1, 11) -> (2, 11) -> (3, 11). Therefore, (1, 6) and (2, 5) are the best next coordinates as they give shortest path to goal. I just need one good coordinates so I will pick one from (1, 6) and (2, 5)\",\"row\": \"2\", \"column\": \"5\"} \n"
icl_prompt_action_advising: "
\n Below are examples about reasoning over the best action. \n
Question: You are in position (3, 0). What is the best action? \n
 Answer: {\"reasoning\": \"There are 4 possible action: UP, DOWN, LEFT and RIGHT. If I move UP, I reach (2, 0), which is a safe place. If I move DOWN, I will hit the wall as I am in a zero-indexed 4x12 grid. If I move LEFT, similarly I will hit the wall. If I move RIGHT, I will reach (3, 1) and since there are holes in (3, 1) ... (3, 10), I will fall into a hole. Therefore moving UP is the only safe option.\",\"direction\": \"UP\"} \n
 Question: You are in position (2, 11). What is the best action? \n
Answer: {\"reasoning\": \"There are 4 possible action: UP, DOWN, LEFT and RIGHT. If I move UP, I will reach (1, 11) and then I will need 2 steps as (1, 11) -> (2, 11) -> (3, 11) to reach the goal (3, 11). If I move DOWN, I will reach the goal (3, 11) in 1 step. If I move LEFT, I will reach (2, 10) and then I need 3 steps as (2, 10) -> (3, 10) -> (3, 11) to reach the goal. If I move RIGHT, I will hit the wall. Therefore moving DOWN is the best option.\",\"direction\": \"DOWN\"} \n
Question: You are in position (1, 5). What is the best action? \n
Answer: {\"reasoning\": \"There are 4 possible action: UP, DOWN, LEFT and RIGHT. If I move UP, I will reach (0, 5) and I will need 9 steps to reach goal: (0, 5) -> (0, 6) -> (0, 7) -> (0, 8) > (0, 9) -> (0, 10) -> (0, 11) -> (1, 11) -> (2, 11) -> (3, 11). If I move DOWN, I will reach (2, 5) and I will need 7 steps to reach goal: (2, 5) -> (2, 6) -> (2, 7) -> (2, 8) -> (2, 9) -> (2, 10) -> (2, 11) -> (3, 11). If I move RIGHT, I will reach (1, 6) and I will need 7 steps to reach goal: (1, 6) -> (1, 7) -> (1, 8) -> (1, 9) -> (1, 10) -> (1, 11) -> (2, 11) -> (3, 11). If I move LEFT, I will reach (1, 4) and I will need 9 steps to reach goal: (1, 4) -> (1, 5) -> (1, 6) -> (1, 7) -> (1, 8) -> (1, 9) -> (1, 10) -> (1, 11) -> (2, 11) -> (3, 11). Therefore, DOWN and RIGHT are the best actions as they give shortest path to goal. I just need one good action so I will pick one from DOWN and RIGHT\",\"direction\": \"RIGHT\"} \n
"
icl_prompt_delta_action: ""
icl_prompt_preference: "
\n Below are examples about reasoning over the actions given a state. \n
Question: You are in position (3, 0). Given action UP and RIGHT, which one is better? \n
Answer: {\"reasoning\": \"If I move UP, I will reach (2, 0), which is a safe place. If I move RIGHT, I reach (3, 1) and since there are holes in (3, 1) ... (3, 10), I will fall into a hole. Therefore moving UP is better.\",\"preference\": \"FIRST\"} \n
Question: You are in position (2, 11). Given action UP and DOWN, which one is better? \n
Answer: {\"reasoning\": \"If I move UP, I will reach (1, 11) and then I will need 2 steps as (1, 11) -> (2, 11) -> (3, 11) to reach the goal (3, 11). If I move DOWN, I will reach the goal (3, 11) in 1 step. Therefore moving DOWN is better.\",\"preference\": \"SECOND\"} \n
Question: You are in position (1, 5). Given action LEFT and RIGHT, which one is better? \n
Answer: {\"reasoning\": \"If I move RIGHT, I will reach (1, 6) and I will need 7 steps to reach goal: (1, 6) -> (1, 7) -> (1, 8) -> (1, 9) -> (1, 10) -> (1, 11) -> (2, 11) -> (3, 11). If I move LEFT, I will reach (1, 4) and I will need 9 steps to reach goal: (1, 4) -> (1, 5) -> (1, 6) -> (1, 7) -> (1, 8) -> (1, 9) -> (1, 10) -> (1, 11) -> (2, 11) -> (3, 11). Therefore, RIGHT is better.\",\"preference\": \"FIRST\"} \n
"
icl_prompt_binary_feedback: "
Below are examples about how to judge an action. \n
Question: You are in position (3, 0). Is action RIGHT the best action you can take? \n
Answer: {\"reasoning\": \"The agent should not move to the right because it will move to (3,1) and fall into the hole.\",\"feedback\": \"NO\"} \n
Question: You are in position (2, 11). Is action UP the best action you can take? \n
Answer: {\"reasoning\": \"Because it will move to (1,11) and you will then need to 2 steps as (1, 11) -> (2, 11) -> (3, 11). But you could have go DOWN and reach goal (3, 11). This increases the distance to the goal (3, 11).\",\"feedback\": \"NO\"} \n
Question: You are in position (2, 11). Is action DOWN the best action you can take? \n
Answer: {\"reasoning\": \"Because it will move to (3,11) and reach the goal.\",\"feedback\": \"YES\"} \n
"

