{
       "Semester": "Spring 2019",
       "Question Number": "6",
       "Part": "g",
       "Points": 1.75,
       "Topic": "Classifiers",
       "Type": "Text",
       "Question": "After taking 6.036, Bob decides to train a recommender system to predict what ratings different customers will give to different movies. Currently, he knows of three really popular movies, and he knows of two potential customers who have ranked some of these movies. The data matrix currently looks like: $Y=[[2, ?, 3],[4,2, ?]]$ where, as in class, rows correspond to customers and columns correspond to movies, and ? indicates a missing or unknown ranking. He decides to find a low rank factorization of $Y$ using the alternating least squares algorithm implemented in class. Assume for this question that offsets are set to $0.$\nBob is happy about what he has accomplished, until he realizes that there are a bunch of movies and users that he still needs to add to his database! He sees that his database will slowly grow over time, and that it will be time-consuming to train a completely new model every single time he updates his database. If Bob has an $m \\times n$ data matrix which he wants to find a rank $k$ factorization for, his analysis indicates that the worst-case run-time (in terms of number of expensive multiplications) of performing alternating least squares for $t$ iterations (where each iteration updates both $U$ and $V)$ will be $O(k^{2}*m*n*t)$.\nInstead, Bob comes upon the following idea: whenever he gets information about a new movie, he adds an extra row to $V$ but does not alter the existing entries of $U$ or $V$. He then finds the values of the entries in that extra row that minimize the objective function (with no regularization). He performs a similar procedure when he gets a new user, but instead adds an extra row to $U$. \nBob modifies this procedure so that he still adds new movies and users in this way, but after every 100 new additions, he retrains $U$ and $V$ from scratch using alternating least squares. Would you expect that this method would make better predictions than if we just used Bob's original procedure? Explain.",
       "Solution": "Yes.\n\nWhenever we retrain $U$ and $V$ from scratch, we are minimizing the objective function over all variables in the problem (all entries of $U$ and $V$ ) so the minimum of the objective will be lower than we could obtain by just retraining a subset of variables, as we were doing in the previous part to lower computational costs. Name:\n\nThus, this method will make better predictions"
}