{
       "Question number": "6",
       "Sub-Question number": "f",
       "Question": "State one advantage of SARSA over Q-learning and one advantage of Q-learning over SARSA",
       "Solution": "Q over SARSA: off--policy, can learn optimal policy even while continuing to adapt to the environment via eps-greedy; Q is also less succeptible to \"local minima\" or learning the wrong (suboptimal) policy than SARSA, since exploration in SARSA has to be coupled with \"greedy in the limit\". SARSA over Q-learning: simpler, does not have \"max a\" component; Provide risk-aversion"
}