{
       "Question number": "6",
       "Sub-Question number": "c",
       "Question": "Briefly, why are Q-learning and SARSA designed to learn Q-values rather than just MDF values V(s); ie. why learn \"state-action values\" rather than just \"state values\"",
       "Solution": "The value function $V(s)$ does not provide enough information, without also learning $r(s,a)$ and $p(s'|s,a)$, to know how to act! In comparison, $\\pi(s)\\in argmax_n Q(s,a)$ tells an agent how to act with Q-values"
}