AGRONER: An unsupervised agriculture named entity recognition using weighted distributional semantic model
Abstract: In this work, we propose a novel weighted distributional semantic model for unsupervised Named Entity Recognition (NER) in domain specific texts, specifically focusing on agricultural domain. Developing accurate agriculture NER models requires overcoming several challenges, including the lack of annotated data, domain-specific vocabulary, entity ambiguity, and contextual variation. The proposed approach is completely unsupervised and utilizes an extended BERT model with LDA topic modeling (exBERT_LDA+<math><mrow is="true"><mi is="true">e</mi><mi is="true">x</mi><mi is="true">B</mi><mi is="true">E</mi><mi is="true">R</mi><mi is="true">T</mi><mtext is="true">_</mtext><mi is="true">L</mi><mi is="true">D</mi><msup is="true"><mrow is="true"><mi is="true">A</mi></mrow><mrow is="true"><mo is="true">+</mo></mrow></msup></mrow></math>) for NER. The proposed Agricultural Named Entity Recognition (AGRONER) model, focuses on identifying six major entities, disease, soil, pathogen, pesticide, crops, and place. The existing four entities are recognized using the proposed algorithm while we utilize the AGROVOC dictionary for crops and Geocoding APIs for Place entities. Due to the absence of a benchmark dataset in the agriculture domain, we created a corpus of 30,000 sentences extracted from recognized agriculture sites. For the evaluation, we used a test corpus with 700 sentences that include 1690 entity names. The labeled entities were then manually checked to evaluate the prediction accuracy. The proposed approach presents a macro average F-measure of 80.43%, which is quite promising for an unsupervised domain specific entity labeling. We performed ablations studies, where the proposed model exhibited a relative percentage improvement of 31.56%, 26.11% F-measure when compared to BERT without LDA (BERT_LDA−<math><mrow is="true"><mi is="true">B</mi><mi is="true">E</mi><mi is="true">R</mi><mi is="true">T</mi><mtext is="true">_</mtext><mi is="true">L</mi><mi is="true">D</mi><msup is="true"><mrow is="true"><mi is="true">A</mi></mrow><mrow is="true"><mo is="true">−</mo></mrow></msup></mrow></math>) and extended BERT without LDA (exBERT_LDA−<math><mrow is="true"><mi is="true">e</mi><mi is="true">x</mi><mi is="true">B</mi><mi is="true">E</mi><mi is="true">R</mi><mi is="true">T</mi><mtext is="true">_</mtext><mi is="true">L</mi><mi is="true">D</mi><msup is="true"><mrow is="true"><mi is="true">A</mi></mrow><mrow is="true"><mo is="true">−</mo></mrow></msup></mrow></math>)models, respectively. Experimental results show the efficacy of the proposed approach in labeling the named entities in an unsupervised set-up for the agricultural domain. Further, the approach can be easily extended to recognize more domain-specific entities.1
Loading