{
       "Semester": "Spring 2019",
       "Question Number": "9",
       "Part": "c",
       "Points": 2.0,
       "Topic": "RNNs",
       "Type": "Text",
       "Question": "We want to make an RNN to translate English to Martian. We have a training set of pairs $\\left(e^{(i)}, m^{(i)}\\right)$, where $e^{(i)}$ is a sequence of length $J^{(i)}$ of English words and $m^{(i)}$ is a sequence of length $K^{(i)}$ of Martian words. The sequences, even within a pair, do not need to be of the same length, i.e., $J^{(i)}$ need not equal $K^{(i)}$. We are considering two different strategies for turning this into a transduction or sequence-to-sequence learning problem for an RNN.\nMethod 1: Construct a training-sequence pair $(x, y)$ from an example $(e, m)$ by letting $x=(e_{1}, e_{2}, \\ldots, e_{L}, stop)$, $y=\\left(m_{1}, m_{2}, \\ldots, m_{L}, \\text { stop }). In Method 1, we assume that if the original $e$ and $m$ had different numbers of words, then the shorter sentence is padded with enough time-wasting words (\"ummm\" for English, \"grlork\" for Martian) so that they now have equal length, L. Any needed padding words are inserted at the end of $e^{(i)}$, and at the start of $m^{(i)}$.\nMethod 2: Construct a training-sequence pair $(x, y)$ from an example $(e, m)$ by letting $x=(e_{1}, e_{2}, \\ldots, e_{J}, \\text { stop, blank }, \\ldots, \\text { blank })$, $y=(\\text { blank }, \\ldots, \\text { blank, } m_{1}, m_{2}, \\ldots, m_{K}, \\text { stop })$. In Method 2 , blanks are inserted at the end of $e$ and start of $m$ such that the length of $x$ and $y$ are now both $J+K+1$.\nWhich method is likely to need a higher dimensional state? Explain why.",
       "Solution": "Method 2 likely needs to have a larger state to hold a representation of the full input sentence $e$, while Method 1 might have a shorter state that enables mapping of individual words or shorter sub-sequences of words to corresponding output words or sub-sequences."
}