Abstract: Word embedding methods like word2vec and GloVe have been shown to learn strong representations of words.  However, these methods only learn representations for words in the training corpus.  This is problematic, as models using these representations need ways to handle unknown and new words, known as out-of-vocabulary (OOV) words.  As a result, there have been multiple attempts to learn OOV word representations in a similar fashion to how humans learn new words, using surrounding words (``context clues") and word roots/subwords.  However, most current approaches suffer from two problems.  First, these models calculate context clue estimates and subword estimates separately and then combine them shallowly for a final estimate, therefore ignoring potentially important information each type can learn from the other.  Secondly, although subword embeddings are trained to estimate word vectors, we find these embeddings don't occupy the same space as word embeddings.  Current models do not take this into account, and do not align the spaces before combining them.  In response to this, we propose Crossword, a transformer based OOV estimation model that combines context and subwords at the attention level, allowing each type to influence the other for a stronger final estimate.  Crossword successfully combines these different sources of information using cross attention, along with strategies to align subword and context spaces.
0 Replies
Loading