{
       "Semester": "Spring 2021",
       "Question Number": "10",
       "Part": "e",
       "Points": 2.0,
       "Topic": "Classifiers",
       "Type": "Text",
       "Question": "Given a set of data $\\mathcal{D}_\\text{train} = \\{(\\ex{x}{i}, \\ex{y}{i})\\}$, a weighted nearest neighbor regressor  has the form \n\\[h(x , \\theta) = \\frac{\\sum_{(\\ex{x}{i}, \\ex{y}{i}) \\in \\mathcal{D}_\\text{train}} f(x, \\ex{x}{i} , \\theta) \\ex{y}{i}}\n{\\sum_{(\\ex{x}{i}, \\ex{y}{i}) \\in \\mathcal{D}_\\text{train}} f(x, \\ex{x}{i} , \\theta) }\n\\,\\,.\\]\nA typical choice for $f$ is\n\\[f(x, x', \\theta) = e^{-\\theta \\|x - x'\\|^2}\n\\]\nwhere $\\theta$ is a scalar and $\\|x-x'\\|^2 = \\sum_{j=1}^d (x_j - x'_j)^2$. How does a wieghted nearest neighbor approach compare to linear regression for the same data? Why might we prefer one over the other? If we were only ever going to have to make predictions on the training data, what value of $\\theta$ would tend to minimize our prediction error? Dino thinks the denominator in the definition of h is not useful and it would be fine to remove it. Is Dino right?",
       "Solution": "No. The denominator is needed for normalization (to keep the prediction\nin the same range of y\u2019s as the training data)."
}