{
       "Question number": "7",
       "Sub-Question number": "e",
       "Question": "Chris wants to use Q-learning to solve a video-game problem in which there is a ball moving on an $\\mathrm{n}$ by $\\mathrm{n}$ pixel screen, similar to the one we studied in class. However, instead of moving a paddle up and down along the right wall, there is a \"\"photon cannon\"\" fixed in the middle of the right-hand side, and the player is allowed to instantaneously set the angle of the cannon and to try to shoot. If the photon beam hits the ball, the ball will reflect backwards. It takes 10 time steps for the cannon to recharge after being fired, however, before it can be fired again. Our goal here is to try to understand how to apply deep Q-learning to this problem.\nThe state of the system is composed of five parts:\n- Ball position x (1 .. n)\n- Ball position y (1 ... n)\n- Ball velocity x $(-1,1)$\n- Ball velocity y $(-1,0,1)$\n- Number of time steps until the cannon is ready to shoot again $(0, \\ldots, 10)$\nThe possible actions at each time step involve both the aim and whether to try to shoot:\n- Cannon angle in degrees $(-60,-30,0,30,60)$\n- Shoot cannon $(1,0)$\nThe options for us in terms of solving the game include how we represent the states and actions and how these are mapped to Q-values. We won't worry about the exploration problem here, only about representing $Q$-values. Our learning algorithm performs gradient descent steps on the squared Bellman error with respect to the parameters in the $Q$-values.If we increase $n$ and also include many more angle gradations for the aim, so that $|S|$ and $|A|$ are very large, which of the following architectures would we prefer for repenting Q-values? Choose from: \nA. $|S|$ input units (one-hot vector for $s),\nB. |A|$ output units, 5 input units, some $m$ hidden units, $|A|$ output units,\nC. $5+2$ input units for the five part state, two-part action, $m$ hidden units and one output unit \nD. 5 input units, some $m$ hidden units, and two output units.",
       "Solution": "C. $5+2$ input units for the five part state, two-part action, $m$ hidden units and one output unit "
}