{
       "Semester": "Spring 2022",
       "Question Number": "6",
       "Part": "b",
       "Points": 1.5,
       "Topic": "MDPs",
       "Type": "Image",
       "Question": "Consider the MDP shown above. It has states $S_{0}, \\ldots, S_{6}$ and actions $A, B$. Each arrow is labeled with one or more actions, and a probability value: this means that if any of those actions is chosen from the state at the start of the arrow, then it will make a transition to the state at the end of the arrow with the associated probability.\n\nRewards are associated with states, and independent, in this example, from the action that is taken in that state. Remember that with horizon $H=1$, the agent can collect the reward associated with the state it is in, and then terminates. \nWhat is the optimal value $V_{h-1}(s)=\\max _{a} Q_{h-1}(s, a)$ for each state for horizon $H=1$ with no discounting?",
       "Solution": "i. $V_{h=1}\\left(S_{0}\\right)$ [1]\nii. $V_{h-1}\\left(S_{1}\\right)$ 0\niii. $V_{h=1}\\left(S_{2}\\right)$"
}