{
       "Semester": "Fall 2019",
       "Question Number": "6",
       "Part": "f",
       "Points": 2.0,
       "Topic": "Decision Trees",
       "Type": "Text",
       "Question": "Consider the following 2D dataset in the (x,y) format: ((0,1), +1), ((1,1),+1), ((-1,-2),-1), ((0,-1),-1), ((1,-1),+1), ((2,-1),-1). Consider the following splits: Split A: x2 >= 0\nSplit B: x1 >= 0:5\nSplit C: x1 >=\udbc0\udc000:5\nPaul Bunyan works to construct trees using the algorithm discussed in the lecture notes, i.e., a greedy algorithm that recursively minimizes weighted average entropy, considering only combinations of the three splits mentioned above. He wants the output of the tree for any input $\\left(x_{1}, x_{2}\\right)$ to be the probability that the input is a positive $(+1)$ example.\nRecall that the weighted average entropy $\\bar{H}$ of a split into subsets $R_{1}$ and $R_{2}$ is\n$$\n\\bar{H}(\\text { split })=\\left(\\text { fraction of points in } R_{1}\\right) \\cdot H\\left(R_{1}\\right)+\\left(\\text { fraction of points in } R_{2}\\right) \\cdot H\\left(R_{2}\\right)\n$$\nwhere the entropy $H\\left(R_{m}\\right)$ of data in a region $R_{m}$ is given by\n$$\nH\\left(R_{m}\\right)=-\\sum_{k} \\hat{P}_{m k} \\log _{2} \\hat{P}_{m k}\n$$\nHere $\\hat{P}_{m k}$ is the empirical probability, which is in this case the fraction of items in region $m$ that are of class $k$.\nPaul decides to consider a particular type of \"random forest,\" which is an ensemble or collection of decision trees, where each tree might only have a subset of split features. Paul restricts his trees to only use Splits A, B, C, or some combination of these splits. The final output of the random forest is the average of the output across the collection of $n$ trees (i.e., with equal weight $1 / n$ for each tree in the random forest). Paul's random forest consists of three trees:\n- The tree consisting of the best single split using feature $x_{2}$ only.\n- The tree consisting of the best single split using feature $x_{1}$ only.\n- The tree consisting of the best two splits (in total) using both features $x_{1}$ and $x_{2}$ (this is the tree from part (a) in this problem).\nFor this random forest, what is the output for the probability that an input point at $(-1,1)$ is a positive $(+1)$ example? Note: Paul's calculations in part (a) may be of help.",
       "Solution": "The first tree corresponds to just Split A on $x_{2}$ from Paul's original tree; this tree gives $p=1.0$ for the point being a positive example. As noted in part (a) the best tree splitting only on $x_{1}$ is Split $\\mathrm{C}$, since $\\bar{H}(C)=0.81$ is less than $\\bar{H}(B)=0.92)$;this tree has $p=0.0$ for the point $(-1,1)$ being a positive example. Finally, the two-split tree as derived in part (a) had $p=1.0$. Thus the aggregate (average) probability is that $(-1,1)$ is a positive example is $p=2 / 3$."
}