Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
UNSUPERVISED SENTENCE EMBEDDING USING DOCUMENT STRUCTURE-BASED CONTEXT
Taesung Lee, Youngja Park
Feb 15, 2018 (modified: Feb 15, 2018)ICLR 2018 Conference Blind Submissionreaders: everyoneShow Bibtex
Abstract:We present a new unsupervised method for learning general-purpose sentence embeddings.
Unlike existing methods which rely on local contexts, such as words
inside the sentence or immediately neighboring sentences, our method selects, for
each target sentence, influential sentences in the entire document based on a document
structure. We identify a dependency structure of sentences using metadata
or text styles. Furthermore, we propose a novel out-of-vocabulary word handling
technique to model many domain-specific terms, which were mostly discarded by
existing sentence embedding methods. We validate our model on several tasks
showing 30% precision improvement in coreference resolution in a technical domain,
and 7.5% accuracy increase in paraphrase detection compared to baselines.
TL;DR:To train a sentence embedding using technical documents, our approach considers document structure to find broader context and handle out-of-vocabulary words.