{
       "Semester": "Spring 2018",
       "Question Number": "4",
       "Part": "a.iii",
       "Points": 2.0,
       "Topic": "Classifiers",
       "Type": "Text",
       "Question": "Consider a classification problem in which there are $K$ possible output classes, $1, \\ldots, K$. We have studied using NLL as a loss function in such cases, but that assumes that all mistaken classifications are equally significant. Instead, we'll consider a case where some mistakes are worse than others (e.g., mis-identifying a cow as a horse is not as bad as calling it a mouse). Define the cost matrix $c_{g, a}$ to be the cost for guessing class $g$ when the actual class is $a$. For convenience, we'll write $c_{j}$ for the column of the matrix $\\left[c_{1, j}, c_{2, j}, \\ldots, c_{K, j}\\right]^{T}$ representing the costs of all the possible guesses when $j$ is the actual value. We will use a simple neural network with a softmax activation function, so our prediction $p$ will be a $K \\times 1$ vector: $$ \\begin{aligned} &p=\\operatorname{softmax}(z) \\\\ &z=W^{T} x \\end{aligned} $$ Assume inputs are $d \\times 1$ so $W$ is $d \\times K$. Our loss function, for a prediction vector $p$ when the target output is value $y \\in\\{1, \\ldots, K\\}$ is the expected cost of the prediction: $$ L_{c}(p, y)=\\sum_{k=1}^{K} p_{k} c_{k y}=p^{T} c_{y} $$ So, the overall training objective is to minimize, over a data set of $n$ points, $$ J_{c}(W)=\\sum_{i=1}^{n} L_{c}\\left(p^{(i)}, y^{(i)}\\right) $$ (a) Select which of the following cost matrices $c$ corresponds to each situation described below. A. $\\left[\\begin{array}{llll}1 & 0 & 0 & 0 \\\\ 0 & 1 & 0 & 0 \\\\ 0 & 0 & 1 & 0 \\\\ 0 & 0 & 0 & 1\\end{array}\\right]$ B. $\\left[\\begin{array}{llll}0 & 1 & 1 & 1 \\\\ 1 & 0 & 1 & 1 \\\\ 1 & 1 & 0 & 1 \\\\ 1 & 1 & 1 & 0\\end{array}\\right]$ C. $\\left[\\begin{array}{llll}0 & 0 & 1 & 1 \\\\ 0 & 0 & 1 & 1 \\\\ 1 & 1 & 0 & 0 \\\\ 1 & 1 & 0 & 0\\end{array}\\right]$ D. $\\left[\\begin{array}{cccc}0 & .5 & .5 & .5 \\\\ 2 & 0 & 1 & 1 \\\\ 2 & 1 & 0 & 1 \\\\ 2 & 1 & 1 & 0\\end{array}\\right]$ E. $\\left[\\begin{array}{cccc}0 & 2 & 2 & 2 \\\\ .5 & 0 & 1 & 1 \\\\ .5 & 1 & 0 & 1 \\\\ .5 & 1 & 1 & 0\\end{array}\\right]$. Select which matrix (A, B, C, D, E) corresponds to a situiation where it is worse to miss predicting a particular bad outcome than to predict that outcome by mistake.",
       "Solution": "D"
}