CNN-based Context Sensitive LemmatizationOpen Website

2019 (modified: 05 Nov 2021)COMAD/CODS 2019Readers: Everyone
Abstract: Morphological analysis is always considered as an important task in natural language processing (NLP). Lemmatization is a major morphological operation that finds the dictionary headword/root of a surface word. In context sensitive languages, the context of a surface word plays a key role to find its lemma. So far, a limited number of neural lemmatizers have been developed for in-context lemmatization and these models account the entire sentence as context. In this research, we hypothesize that a limited context is sufficient for lemmatization and thus, make use of the convolutional neural network to accomplish the task. Our proposed BLSTM-CNN lemmatizer is evaluated on four languages, two Indic (Hindi and Marathi) and two European (French and Spanish) ones. The experimental results show that consideration of smaller context indeed achieves better performance over the approaches considering whole sentence as context in determining lemmas of surface words.
0 Replies

Loading