{
       "Semester": "Fall 2019",
       "Question Number": "6",
       "Part": "g",
       "Points": 2.0,
       "Topic": "Decision Trees",
       "Type": "Text",
       "Question": "Consider the following 2D dataset in the (x,y) format: ((0,1), +1), ((1,1),+1), ((-1,-2),-1), ((0,-1),-1), ((1,-1),+1), ((2,-1),-1). Consider the following splits: Split A: x2 >= 0\nSplit B: x1 >= 0:5\nSplit C: x1 >=\udbc0\udc000:5\nPaul Bunyan works to construct trees using the algorithm discussed in the lecture notes, i.e., a greedy algorithm that recursively minimizes weighted average entropy, considering only combinations of the three splits mentioned above. He wants the output of the tree for any input $\\left(x_{1}, x_{2}\\right)$ to be the probability that the input is a positive $(+1)$ example.\nRecall that the weighted average entropy $\\bar{H}$ of a split into subsets $R_{1}$ and $R_{2}$ is\n$$\n\\bar{H}(\\text { split })=\\left(\\text { fraction of points in } R_{1}\\right) \\cdot H\\left(R_{1}\\right)+\\left(\\text { fraction of points in } R_{2}\\right) \\cdot H\\left(R_{2}\\right)\n$$\nwhere the entropy $H\\left(R_{m}\\right)$ of data in a region $R_{m}$ is given by\n$$\nH\\left(R_{m}\\right)=-\\sum_{k} \\hat{P}_{m k} \\log _{2} \\hat{P}_{m k}\n$$\nHere $\\hat{P}_{m k}$ is the empirical probability, which is in this case the fraction of items in region $m$ that are of class $k$.\nWould you expect the accuracy for Paul's random forest generated decision to be better, or for the decision made by Paul's single two-split decision tree from part (a) to be better, when evaluated against held-out test data? Explain.",
       "Solution": "We would expect that the random forest generated decision will generalize better. Using all the features available to us can lead to over-fitting. For random forests, although each individual decision tree can have a higher error rate on the training data, the averaging effect (or majority vote for classification trees) can serve as a filter on noise vs. true signal."
}