Abstract: Agriculture plays a crucial role in Brazil's economy. As the country intensifies its activities in the sector, the use of pesticides also increases. Hence, the risks associated with pesticide-laden food consumption have become a concern for chemistry researchers. An issue affecting regulatory standardization of pesticides in Brazil is the difficulty in translating pesticide names, particularly from English. For example, the word malathion can be translated from English to Portuguese as malatiom or malatião, resulting in inconsistent labeling. This issue extends to the broader problem of translating highly technical terms between languages, in particular for low-resource languages. In this work, we investigate terminological variation in the chemistry of organophosphorus pesticides. Our goal is to study strategies for domain-specific multilingual keyword extraction. To that end, two corpora were built based on pesticide-related scientific documents in Brazilian Portuguese and English, which led to a total of 84 and 210 texts, respectively, representing the low- and high-resource languages in this study. We then assessed 6 methods for keyword extraction: Simple Maths, TF-IDF, YAKE, TextRank, MultipartiteRank, and KeyBERT. We relied on a multilingual contextual BERT embedding to retrieve corresponding pesticide names in the target language. Fine-tuning was also explored to improve the multilingual representation further. Moreover, we evaluated the use of large language models (LLMs) combined with the recent retrieval-augmented generation (RAG) framework. As a result, we found that the contextual approach, combined with fine-tuning, provided the best results, contributing to enhancing Pesticide Terminology Extraction in a multilingual scenario.
External IDs:dblp:journals/jbcs/SouzaACSGNSLPKDHA25
Loading