{
       "Semester": "Spring 2018",
       "Question Number": "1",
       "Part": "c",
       "Points": 4.0,
       "Topic": "Classifiers",
       "Type": "Text",
       "Question": "In this problem, we will consider two-dimensional input data vectors $x=\\left[x_{1}, x_{2}\\right]^{T}$. We will explore the impact of a feature transformation on various learning methods. We will be using the feature transformation $$ \\phi(x)=\\left[1, x_{1}, x_{2}, x_{1} x_{2}, x_{1}^{2}, x_{2}^{2}\\right]^{T} $$ Consider the following $2 \\mathrm{D}$ data sets: Classic XOR: positive: $(0,1),(1,0)$ and negative: $(0,0),(1,1)$. Signed XOR: positive: $(-1,1),(1,-1)$ and negative: $(-1,-1),(1,1)$  For the dataset indicated below, could a one-hidden-layer neural network with $x_{1}$ and $x_{2}$ as inputs, a layer of up to four relu units and a final tanh output unit be trained to separate the data set? The network is specified as follows: $$ \\begin{aligned} &z=W^{T} x+W_{0} \\\\ &o=\\tanh \\left(V^{T} \\operatorname{relu}(z)+V_{0}\\right) \\end{aligned} $$ Assuming you use $k \\leq 4$ hidden units, $W$ is $2 \\times k$, $W_{0}$ is $k \\times 1$ and $V$ is $k \\times 1$ and $V_{0}$ is $1 \\times 1$. Signed XOR: positive: $(-1,1),(1,-1)$ and negative: $(-1,-1),(1,1)$",
       "Solution": " Yes. The simplest way is to have four ReLU units. The $i$ th ReLU unit is responsible for being positive when given the $i$ th input, and negative when given any of the other three inputs. The connection between the $i$ th ReLU unit and the tanh layer should be a large positive number when the $i$ th label is $+1$, and a large negative number when the $i$ th label is $-1$."
}