{
       "Semester": "Spring 2019",
       "Question Number": "3",
       "Part": "d",
       "Points": 1.5,
       "Topic": "MDPs",
       "Type": "Text",
       "Question": "Tic-tac-toe is a paper-and-pencil game for two players, X and O, who take turns\nmarking the spaces in a 3\u00d73 grid. The player who succeeds in placing three of their marks sequentially in a horizontal, vertical, or diagonal row wins the game. In this question, we'll consider a solitaire version of tic-tac-toe, in which we assume:\n\u2022 We are the X player;\n\u2022 The O player is a fi\fxed (but possibly stochastic) algorithm;\n\u2022 The initial state of the board is empty, and X has the \ffirst move;\n\u2022 We can select any of the nine squares on our turn;\n\u2022 We don't know the strategy of the O player or the reward function used by O.\nWe place an X in an empty square, then an O appears in some other square, and then it's our turn to play again. We receive a +1 reward for getting three X's in a row, reward -1 if there are three O's in a row, and reward 0 otherwise. If we select a square that already has an X or an O in it, nothing changes and it's still our turn.\nWe can model this problem as a Markov decision process in several different ways. Here are some possible choices for the state space.\n\u2022 Jody suggests letting the state space be all possible 3 x 3 grids in which each square contains one of the following: a space, an O, and an X.\n\u2022 Dana suggests using all possible 3 x 3 grids in which each square contains one of the three options (a space, an O, and an X), and there is an equal number of O's and X's.\n\u2022 Chris suggests using all 3 x 3 tic-tac-toe game grids which appear in games where the players both employ optimal strategies.\nYou get to sit and watch an expert player (who always makes optimal moves) play this game for a long time, and you observe the sequence of state-action pairs that occur in many games. Which of the following machine-learning problem formulations is most appropriate, for you to learn how to play the game? For the item you select, provide the specified additional information (where not \"none\").\n1. supervised regression (describe the loss function)\n2. supervised classification (describe the loss function)\n3. reinforcement learning of a policy (none)\n4. reinforcement learning of a value function (none)\nExplain your answer.",
       "Solution": "supervised classification (loss function). You learn the mapping from input to output (e.g., the position on the grid, where you need to make the next move). The loss function could be the negative log likelihood between the expert's move and your predicted move."
}