{
       "Semester": "Spring 2022",
       "Question Number": "8",
       "Part": "e.ii",
       "Points": 1.0,
       "Topic": "CNNs",
       "Type": "Text",
       "Question": "We are going to consider two different simple convolutional networks over one dimensional (vector) inputs. Each network has a single convolutional layer with a single filter of size 3 and stride 1 . Let $\\left(z_{1}, \\ldots, z_{d}\\right)$ be the output of this convolutional layer, i.e., $\\left(z_{1}, \\ldots, z_{d}\\right)$ represents the feature map constructed from the input vector $\\left(x_{1}, \\ldots, x_{d}\\right)$. For simplicity, you can think of $z_{j}$ just as a linear map $z_{j}=\\left[x_{j-1} ; x_{j} ; x_{j+1}\\right]^{T} w$ where $w$ are the filter parameters. Our two networks differ in terms of how the feature map values are pooled to a single output value.\n\nNetwork A has a single max-pooling layer with input size $d$, so that the output of the network $\\hat{y}=\\sigma\\left(\\max \\left(z_{1}, \\ldots, z_{d}\\right)\\right)$ where $\\sigma(\\cdot)$ is the sigmoid function.\nNetwork B has a single min-pooling layer with input size $d$, so that the output of the network\n$$\n\\hat{y}=\\sigma\\left(\\min \\left(z_{1}, \\ldots, z_{d}\\right)\\right)\n$$\nWhen the filter's output value is high it represents a positive detection of some pattern of interest. Now, suppose we are just given a single training pair $(x, y)$ where the target $y$ is binary $0 / 1$. The loss that we are minimizing is again just\n$$\n\\mathrm{NLL}(y, \\hat{y})=-y \\log \\hat{y}-(1-y) \\log (1-\\hat{y})\n$$\nwhich is minimized when $\\hat{y}$ matches the target $y$. We are interested in understanding qualitatively how the filter parameters $w$ get updated in the two networks if we use simple gradient descent to minimize $\\operatorname{NLL}(y, \\hat{y})$. Specify whether the behavior would occur in:\n- Which network (A, B, or it doesn't matter)\n- Target $y$ (1, 0, or it doesn't matter). \nThe behavior is: \nAfter each step of gradient descent, the filter weights $w$ change so that their dot product with the values of one particular sub-region $\\left[x_{j-1} ; x_{j} ; x_{j+1}\\right]$ of the image decreases.",
       "Solution": "B, 0"
}