{
       "Semester": "Spring 2021",
       "Question Number": "10",
       "Part": "c",
       "Points": 2.0,
       "Topic": "Classifiers",
       "Type": "Text",
       "Question": "Given a set of data $\\mathcal{D}_\\text{train} = \\{(\\ex{x}{i}, \\ex{y}{i})\\}$, a weighted nearest neighbor regressor  has the form \n\\[h(x , \\theta) = \\frac{\\sum_{(\\ex{x}{i}, \\ex{y}{i}) \\in \\mathcal{D}_\\text{train}} f(x, \\ex{x}{i} , \\theta) \\ex{y}{i}}\n{\\sum_{(\\ex{x}{i}, \\ex{y}{i}) \\in \\mathcal{D}_\\text{train}} f(x, \\ex{x}{i} , \\theta) }\n\\,\\,.\\]\nA typical choice for $f$ is\n\\[f(x, x', \\theta) = e^{-\\theta \\|x - x'\\|^2}\n\\]\nwhere $\\theta$ is a scalar and $\\|x-x'\\|^2 = \\sum_{j=1}^d (x_j - x'_j)^2$.\nHow does a weighted nearest neighbor approach compare to linear regression for the same data? Why might we prefer one over the other?\n",
       "Solution": "A linear regression model would fit a straight line through the training data and allow extrapolation. It would predict h(10) to be much larger because that is the trend in the training data (y is becoming larger as x is becoming large).\nThe Heavy Neighbor approach will keep the predictions within the limits of the training data labels (it is a weighted average of the training data points). This would be preferred if we do not want to extrapolate beyond the training data."
}