Abstract: Idioms are permanent phrases that are often formed from tales. They are prevalent in informal discussions and literary works. Their meanings are often quite devoid of composition. The idiom cloze task is a difficult research challenge in Natural Language Processing (NLP). On available datasets, sequence-to-sequence (Seq2Seq) model-based approaches to this problem fared pretty well. However, they lack comprehension of the non-compositional nature of idiomatic idioms. In addition, they do not evaluate both the local and global contexts simultaneously. In this research, we present a BERT-based embedding Seq2Seq model that captures idiomatic phrases and takes global and local contexts into account. Our methodology uses XLNET as the encoder and Roberta to choose the most likely idiom for a given scenario. Experiments conducted on the EPIE Static Corpus dataset demonstrate that our approach outperforms the current state-of-the-art.
Paper Type: short
Research Area: Question Answering
0 Replies
Loading