{
       "Semester": "Spring 2019",
       "Question Number": "8",
       "Part": "a",
       "Points": 2.666666667,
       "Topic": "Neural Networks",
       "Type": "Text",
       "Question": "In this problem we will investigate regularization for neural networks.\nKim constructs a fully connected neural network with $L=2$ layers using mean squared error (MSE) loss and ReLU activation functions for the hidden layer, and a linear activation for the output layer. The network is trained with a gradient descent algorithm on a data set of $n$ points $\\left\\{\\left(x^{(1)}, y^{(1)}\\right), \\ldots,\\left(x^{(n)}, y^{(n)}\\right)\\right\\}$.\nRecall that the update rule for weights $W^{1}$ can be specified in terms of step size $\\eta$ and the gradient of the loss function with respect to weights $W^{1}$. This gradient can be expressed in terms of the activations $A^{l}$, weights $W^{l}$, pre-activations $Z^{l}$, and partials $\\frac{\\partial L}{\\partial A^{2}}$, $\\frac{\\partial A^{l}}{\\partial Z^{l}}$, for $l=1,2$ :\n$$\nW^{1}:=W^{1}-\\eta \\sum_{i=1}^{n} \\frac{\\partial L\\left(h\\left(x^{(i)} ; W\\right), y^{(i)}\\right)}{\\partial W^{1}}\n$$\nwhere $h(\\cdot)$ is the input-output mapping implemented by the entire neural network, and\n$$\n\\frac{\\partial L}{\\partial W^{1}}=\\frac{\\partial Z^{1}}{\\partial W^{1}} \\cdot \\frac{\\partial A^{1}}{\\partial Z^{1}} \\cdot W^{2} \\cdot \\frac{\\partial A^{2}}{\\partial Z^{2}} \\cdot \\frac{\\partial L}{\\partial A^{2}}\n$$\nDerive a new update rule for weights $W^{1}$ which also penalizes the sum of squared values of all individual weights in the network:\n$$\nL^{n e w}=L\\left(h\\left(x^{(i)} ; W\\right), y^{(i)}\\right)+\\lambda\\|W\\|^{2}\n$$\nwhere $\\lambda$ denotes the regularization trade-off parameter. You can express the new update rule as follows:\n$$\nW^{1}:=\\alpha W^{1}-\\eta \\sum_{i=1}^{n} \\frac{\\partial L\\left(h\\left(x^{(i)} ; W\\right), y^{(i)}\\right)}{\\partial W^{1}}\n$$\nwhere $L(\\cdot)$ represents the previous prediction error loss.\nWhat is the value of $\\alpha$ in terms of $\\lambda$ and $\\eta$ ?",
       "Solution": "W^{1}:=(1-2 \\lambda \\eta) W^{1}-\\eta \\sum \u2202L/\u2202W^{1}\nThus $\\alpha=1-2 \\lambda \\eta$"
}