{
       "Question number": "6",
       "Sub-Question number": "2",
       "Question": "Assume you are given a neural network with $L$ layers to minimize a loss function $\\mathcal{L}$\n\n$$\n\\begin{aligned}\nh(\\mathbf{x}) &=\\mathbf{w}^{\\top} \\phi_{1}(\\mathbf{x}) \\\\\n\\phi_{1}(\\mathbf{x}) &=\\sigma\\left(\\mathbf{U}_{1} \\phi_{2}(\\mathbf{x})\\right) \\\\\n& \\vdots \\\\\n\\phi_{\\ell}(\\mathbf{x}) &=\\sigma\\left(\\mathbf{U}_{\\ell} \\phi_{\\ell+1}(\\mathbf{x})\\right) \\\\\n& \\vdots \\\\\n\\phi_{L}(\\mathbf{x}) &=\\sigma\\left(\\mathbf{U}_{L} \\mathbf{x}\\right)\n\\end{aligned}\n$$\n\n(Note that the subscript of $\\phi$ starts at 1 at the end of the network, and increases to $L$ as we make our way back to the start) Assume that the derivative of $\\sigma(z)$ is given as $\\sigma^{\\prime}(z)$. Define $\\delta_{\\ell+1}$ as a function of $\\delta_{\\ell}$. (assume $1<\\ell<L$ ) where $x=\\phi_{L+1}$",
       "Solution": "$$\n\\begin{aligned}\n\\delta_{\\ell+1} &=\\frac{\\partial \\mathcal{L}}{\\partial a_{\\ell+1}} \\\\\n&=\\frac{\\partial \\mathcal{L}}{\\partial \\phi_{\\ell+1}} \\frac{\\partial \\phi_{\\ell+1}}{\\partial a_{\\ell+1}} \\\\\n&=\\frac{\\partial \\mathcal{L}}{\\partial a_{\\ell}} \\frac{\\partial a_{\\ell}}{\\partial \\phi_{\\ell+1}} \\frac{\\partial \\phi_{\\ell+1}}{\\partial a_{\\ell+1}} \\\\\n&=\\sigma^{\\prime}\\left(a_{\\ell+1}\\right) \\odot \\mathbf{U}_{\\ell}^{T} \\delta_{\\ell} \\\\\n&=\\sigma^{\\prime}\\left(\\mathbf{U}_{\\ell+1} \\phi_{\\ell+2}\\right) \\odot \\mathbf{U}_{\\ell}^{T} \\delta_{\\ell}\n\\end{aligned}\n$$"
}