{
       "Semester": "Fall 2019",
       "Question Number": "4",
       "Part": "e",
       "Points": 1.5,
       "Topic": "Neural Networks",
       "Type": "Text",
       "Question": "Otto N. Coder is exploring different autoencoder architectures. Consider the following autoencoder with input $x \\in \\mathbb{R}^{d}$ and output $y^{\\text {pred }} \\in \\mathbb{R}^{d}$. The autoencoder has one hidden layer with $m$ hidden units: $z^{(1)}, a^{(1)} \\in \\mathbb{R}^{m}$. Assume $x, z^{(2)}$, and $y^{\\text {pred }}$ have dimensions $d \\times 1$. Also let $z^{(1)}$ and $a^{(1)}$ have dimensions $m \\times 1$. \nOtto trains the autoencoder with back-propagation. The loss for a given datapoint $x, y$ is:\n$$\nJ(x, y)=\\frac{1}{2}\\left\\|y^{\\text {pred }}-y\\right\\|^{2}=\\frac{1}{2}\\left(y^{\\text {pred }}-y\\right)^{T}\\left(y^{\\text {pred }}-y\\right)\n$$\nCompute the following intermediate partial derivatives. For the following questions, write your answer in terms of $x, y, y^{p r e d}, W^{(1)}, b^{(1)}, W^{(2)}, b^{(2)}, f^{(1)}, f^{(2)}$ and any previously computed or provided partial derivative. Also note that:\n1. Let $\\partial f^{(1)} / \\partial z^{(1)}$ be an $m \\times 1$ matrix, provided to you.\n2. Let $\\partial f^{(2)} / \\partial z^{(2)}$ be a $d \\times 1$ matrix, provided to you.\n3. If $A x=y$ where $A$ is a $m \\times n$ matrix and $x$ is $n \\times 1$ and $y$ is $m \\times 1$, then let $\\partial y / \\partial A=x$.\n4. In your answers below, we will assume multiplications are matrix multiplication; to indicate element-wise multiplication, use the symbol *.\nWrite the gradient descent update step for just $W^{(2)}$ for one datapoint $(x, y)$ given learning rate $\\eta$ and $\\partial J / \\partial W^{(2)}$.",
       "Solution": "W^{(2)}:=W^{(2)}-\\eta \u2202J(x,y)/\u2202W^{(2)}"
}