{
       "Semester": "Fall 2018",
       "Question Number": "5",
       "Part": "f",
       "Points": 1.857142857,
       "Topic": "MDPs",
       "Type": "Image",
       "Question": "Consider the following MDP with $k+4$ statess. There are two actions, $a_{1}$ and $a_{2}$. Arrows with no labels represent a transition for both actions with probability 1. Arrows labeled $a / p$ make the transition on action $a$ with probability $p$. States with no label have reward 0 . Two states have reward $+1$, obtained when taking an action in that state. There are $k-2$ states between $s_{1}$ and $s_{k}$, with a deterministic transition on any action (so that once you are in s1 you are guaranteed to end up in $s_{k}$ in $k-1$ steps).\nWe are interested in the infinite-horizon discounted values of some states in this MDP. What is $V\\left(s_{x}\\right)$ when $0<\\gamma<1$ ?",
       "Solution": "\\frac{\\gamma}{2 - \\gamma}"
}