{
       "Semester": "Spring 2021",
       "Question Number": "5",
       "Part": "b",
       "Points": 1.0,
       "Topic": "Loss Functions",
       "Type": "Image",
       "Question": "We have looked at many machine-learning algorithms with hyper-parameters. Varying each of them has an effect on the loss on both the training data and on unseen testing data. What plot would describe the most typical behavior for the testing error dependency on the step-size in gradient descent for neural networks (assuming a \u2000fixed number of \niterations): monotonically decreasing function, convex parabola, monotonically increasing function, monotonically decreasing step function, monotonically increasing step function? If none of them is appropriate, explain.",
       "Solution": "convex parabola"
}