{
       "Question number": "6",
       "Sub-Question number": "d",
       "Question": "We are going to consider two different simple convolutional networks over one dimensional (vector) inputs. Each network has a single convolutional layer with a single filter of size 3 and stride 1 . Let $\\left(z_{1}, \\ldots, z_{d}\\right)$ be the output of this convolutional layer, i.e., $\\left(z_{1}, \\ldots, z_{d}\\right)$ represents the feature map constructed from the input vector $\\left(x_{1}, \\ldots, x_{d}\\right)$. For simplicity, you can think of $z_{j}$ just as a linear map $z_{j}=\\left[x_{j-1} ; x_{j} ; x_{j+1}\\right]^{T} w$ where $w$ are the filter parameters. Our two networks differ in terms of how the feature map values are pooled to a single output value.\n\nNetwork A has a single max-pooling layer with input size $d$, so that the output of the network $\\hat{y}=\\sigma\\left(\\max \\left(z_{1}, \\ldots, z_{d}\\right)\\right)$ where $\\sigma(\\cdot)$ is the sigmoid function.\nNetwork B has a single min-pooling layer with input size $d$, so that the output of the network\n$$\n\\hat{y}=\\sigma\\left(\\min \\left(z_{1}, \\ldots, z_{d}\\right)\\right)\n$$\nWhen the filter's output value is high it represents a positive detection of some pattern of interest. Assume for simplicity that all $z_{1}, \\ldots, z_{d}$ have distinct values (we can ignore the corner cases where some of the values are equal). What is $\\partial \\hat{y} / \\partial z_{i}$ for network A?",
       "Solution": "$\\sigma\\left(z_{i}\\right)\\left(1-\\sigma\\left(z_{i}\\right)\\right)$ if $z_{i}=\\max \\left(z_{1}, \\ldots, z_{d}\\right)$, and 0 otherwise."
}