{
       "Semester": "Spring 2019",
       "Question Number": "4",
       "Part": "e",
       "Points": 2.0,
       "Topic": "CNNs",
       "Type": "Text",
       "Question": "Conne von Lucien has many pictures from her trip to Flatland and wants to determine which ones have her in the image. All of the pictures are arrays of size 4x1, with array values of either 0 or 1. Conne looks like the vector [1,0,1] in one dimension, so if a picture contains the pattern [1,0,1] anywhere inside it, it should be classified as a positive example, otherwise as a negative example.\nFortunately, you learned about CNNs and have helped Conne by designing the following network architecture with three layers:\n1. A convolutional layer with one filter W that is size 3x1, and stride 1, and a single bias w_0 (where the output pixel corresponds to the input pixel that the filter is centered on). Input values of 0 should be assumed beyond the boundaries of the input.\n2. A max-pooling layer P with size 2x1 and stride 2.\n3. A fully connected layer $\\sigma(\\cdot)$ with a single output unit having a sigmoidal activation function.\nConne decides to use the neural network code as written by a $6.036$ student for the $6.036$ homework (and that actually was a correct implementation) to train her CNN using SGD. The sgd procedure may be called multiple times from elsewhere (e.g., to implement multiple epochs of SGD). Conne thinks she has a better sgd python procedure than that given in the package; her code is:\ndef sgd (nn , X, Y, iters =100 , lrate =0.005) :\n    D, N = X.shape\n    sum loss = 0\n    for k in range(iters) :\n        Xt = X[ : , k : k+1]\n        Yt = Y[ : , k : k+1]\n        Ypred = nn.forward(Xt)\n        sum_loss += nn.loss.forward(Ypred , Yt)\n        err = nn.loss.backward()\n        nn.backward(err)\n        nn.sgd_step(lrate)\nHere, $n n$ is an instance of the Sequential class implementing the CNN. She knows from the unit tests that the nn routines function properly. In particular, nn.forward properly computes the predicted outputs Ypred from input data Xt, nn.loss.forward also properly computes the forward loss, $\\mathrm{nn}$.loss.backward properly computes the backward loss, nn. backward properly computes the backward gradients, and nn.sgd_step properly applies an SGD update step with the specified learning rate lrate. And the $N$ sets of dimension $D$ input data $X$, and labels $Y$ are known to be correct.\nHowever, Conne's procedure consistently gives poor results (and occasionally throws errors), compared with the $6.036$ student's correct SGD routine, when run with identical arguments.\nWhy? Specify the line(s) which have errors, and describe how the code should be improved to do as well as the correct implementation of the $6.036$ student",
       "Solution": "Lines 5 and 6. The SGD algorithm needs a random data point to be selected for the gradient computation. Thus, the Xt and Yt assignments should draw from a randomly chosen $j$, e.g.\nfor k in range ( iters ) :\n    j = np.random.randint(N)\n    Xt = X[ : , j : j +1]\n    Yt = Y[ : , j : j +1]\n. . .\nNote that Conne's code may throw errors when iters $\\geq N$."
}