{
       "Semester": "Fall 2018",
       "Question Number": "2",
       "Part": "a",
       "Points": 2.5,
       "Topic": "Neural Networks",
       "Type": "Text",
       "Question": "We will consider a neural network with a slightly unusual structure. Let the input $x$ be $d \\times 1$ and let the weights be represented as $k 1 \\times d$ vectors, $W^{(1)}, \\ldots, W^{(k)}$. Then the final output is\n$$\n\\hat{y}=\\prod_{i=1}^{k} \\sigma\\left(W^{(i)} x\\right)=\\sigma\\left(W^{(1)} x\\right) \\times \\cdots \\times \\sigma\\left(W^{(k)} x\\right)\n$$\nDefine $a^{(j)}=\\sigma\\left(W^{(j)} x\\right)$.\nWhat is $\\partial L(\\hat{y}, y) / \\partial a^{(j)}$ for some $j$ ? Since we have not specified the loss function, you can express your answer in terms of $\\partial L(\\hat{y}, y) / \\partial \\hat{y}$.",
       "Solution": "$$\n\\frac{\\partial L(\\hat{y}, y)}{\\partial \\hat{y}} \\prod_{i \\neq j} \\sigma\\left(W^{(i)} x\\right)\n$$"
}