{
       "Semester": "Fall 2018",
       "Question Number": "2",
       "Part": "c",
       "Points": 2.5,
       "Topic": "Neural Networks",
       "Type": "Text",
       "Question": "We will consider a neural network with a slightly unusual structure. Let the input $x$ be $d \\times 1$ and let the weights be represented as $k 1 \\times d$ vectors, $W^{(1)}, \\ldots, W^{(k)}$. Then the final output is\n$$\n\\hat{y}=\\prod_{i=1}^{k} \\sigma\\left(W^{(i)} x\\right)=\\sigma\\left(W^{(1)} x\\right) \\times \\cdots \\times \\sigma\\left(W^{(k)} x\\right)\n$$\nDefine $a^{(j)}=\\sigma\\left(W^{(j)} x\\right)$.\nWhat is $\\partial a^{(j)} / \\partial W^{(j)}$ ? (Recall that $d \\sigma(v) / d v=\\sigma(v)(1-\\sigma(v))$.)",
       "Solution": "$$\na^{(j)}\\left(1-a^{(j)}\\right) x^{T}\n$$"
}