{
       "Question number": "5",
       "Sub-Question number": "b",
       "Question": "We'll consider here a simple one-dimensional convolutional neural network layer. This is a feature map created by a single filter whose parameters we must learn. The filter represents a local pattern detector that is applied in every position of the input signal. The feature map therefore transforms an input vector (one dimensional signal) into another vector (one-dimensional feature map). To train such a layer on its own, i.e., not as part of a bigger network as we typically do, we can imagine having training pairs $(x, y)$ where $x$ is the input signal as a vector and $y$ is a binary vector representing whether the relevant pattern appeared in a particular position or not. Specifically,\n- Input $x$ is a one-dimensional vector of length $d$.\n- Target $y$ is also a one-dimensional vector of length. $d$. One \"pixel\" in the output, $y_{j}$, has value 1 if the input pixels $x_{j-1}, x_{j}, x_{j+1}$, centered at $j$, exhibit the target pattern and 0 if they do not.\n- The filter is represented by a weight vector $w$ consisting of three values.\n- The output of the network is a vector $\\hat{y}$ whose $j^{t h}$ coordinate (pixel) is $\\hat{y}_{j}=\\sigma\\left(z_{j}\\right.$ ) where $z_{j}=\\left[x_{j-1} ; x_{j} ; x_{j+1}\\right]^{T} w$ and $\\sigma(\\cdot)$ is the sigmoid function. Assume that $x_{0}$ and $x_{d+1}$ are 0 for the purposes of computing outputs.\n- We have a training set $D=\\left(x^{(1)}, y^{(1)}\\right), \\ldots,\\left(x^{(n)}, y^{(n)}\\right)$.\n- We measure the loss between the target binary vector $y$ and the network output $\\hat{y}$ pixel by pixel using cross-entropy (Negative Log-Likelihood or NLL). The aggregate loss over the whole training set is\n$$\nL(w, D)=\\sum_{i=1}^{n} \\sum_{j=1}^{d} \\operatorname{NLL}\\left(y_{j}^{(i)}, \\hat{y}_{j}^{(i)}\\right)\n$$ Provide a formula for $\\nabla_{w} \\mathrm{NLL}\\left(y_{j}, \\hat{y}_{j}\\right)$, which is the gradient of the loss with respect to pixel $j$ of an example with respect to $w=\\left[w_{1}, w_{2}, w_{3}\\right]^{T}$, in terms of $x, y$, and $z$ values only.",
       "Solution": "$$\n\\left(\\sigma\\left(z_{j}\\right)-y_{j}\\right)\\left[\\begin{array}{c}\nx_{j-1} \\\\\nx_{j} \\\\\nx_{j+1}\n\\end{array}\\right]\n$$"
}