- Abstract: This work introduces a simple network for producing character aware word embeddings. Position agnostic and position aware character embeddings are combined to produce an embedding vector for each word. The learned word representations are shown to be very sparse and facilitate improved results on language modeling tasks, despite using markedly fewer parameters, and without the need to apply dropout. A final experiment suggests that weight sharing contributes to sparsity, increases performance, and prevents overfitting.
- TL;DR: A fully connected architecture is used to produce word embeddings from character representations, outperforms traditional embeddings and provides insight into sparsity and dropout.
- Keywords: natural language processing, word embeddings, language models, neural network, deep learning, sparsity, dropout