{
       "Semester": "Fall 2019",
       "Question Number": "8",
       "Part": "a",
       "Points": 2.0,
       "Topic": "Regression",
       "Type": "Text",
       "Question": "We previously examined ridge regression, where a regularizer term $R_{\\lambda}(\\theta)$ is added to a sum of squares loss to form the $J_{1}$ objective function as below. Throughout this problem, we will assume zero offset $\\theta_{0}=0$ and linear models of output $y$ as a function of input $x$. $$ \\begin{aligned} J_{1}(\\theta) &=\\frac{1}{2} \\sum_{i=1}^{n}\\left(y_{i}-\\theta \\cdot x_{i}\\right)^{2}+R_{\\lambda}(\\theta) \\\\ &=\\frac{1}{2} \\sum_{i=1}^{n}\\left(y_{i}-\\theta \\cdot x_{i}\\right)^{2}+\\frac{\\lambda}{2}\\|\\theta\\|^{2} \\end{aligned} $$ Laz Zo prefers an alternative approach (called \"lasso\" regularization), where a different regularizer $R_{\\alpha}(\\theta)$ is added to the sum of squares loss: $$ \\begin{aligned} J_{2}(\\theta) &=\\frac{1}{2} \\sum_{i=1}^{n}\\left(y_{i}-\\theta \\cdot x_{i}\\right)^{2}+R_{\\alpha}(\\theta) \\\\ &=\\frac{1}{2} \\sum_{i=1}^{n}\\left(y_{i}-\\theta \\cdot x_{i}\\right)^{2}+\\alpha \\sum_{j=1}^{k}\\left|\\theta_{j}\\right| \\end{aligned} $$ Consider the two-dimensional case, $k=2$, so that our vector $\\theta$ has just two components, $\\theta_{1}$ and $\\theta_{2}$. Suppose also that $\\theta_{1}>\\theta_{2}$ and both are positive $\\left(\\theta_{1}, \\theta_{2}>0\\right)$. We are interested in the behavior of $R_{\\alpha}(\\theta)$ and $R_{\\lambda}(\\theta)$. Assume both $\\lambda$ and $\\alpha$ are positive. First consider the lasso regularizer for this specific case: $$ R_{\\alpha}(\\theta)=\\alpha \\sum_{j=1}^{k}\\left|\\theta_{j}\\right|=\\alpha\\left(\\theta_{1}+\\theta_{2}\\right) $$ where $R_{\\alpha}(\\theta)=\\alpha\\left(\\theta_{1}+\\theta_{2}\\right)$ in this case since both $\\theta_{1}$ and $\\theta_{2}$ are positive. We consider reducing $\\theta_{1}$ by a small $\\delta$, where $\\delta>0$, versus reducing $\\theta_{2}$ by $\\delta$. (You can assume $\\delta$ is smaller than $\\theta_{1}$ and $\\theta_{2}$.) What is true, if our goal is to minimize $R_{\\alpha}(\\theta)$? Choose one of the following options:\nIt is better to reduce $\\theta_{1}$ by $\\delta$ \nIt is better to reduce $\\theta_{2}$ by $\\delta$ \nIt is equally beneficial to reduce $\\theta_{1}$ or $\\theta_{2}$ by $\\delta$.",
       "Solution": "It is equally beneficial to reduce $\\theta_{1}$ or $\\theta_{2}$ by $\\delta$."
}