Abstract: Modern organizational communication heavily relies on virtual assistants, necessitating robust Natural Language Understanding (NLU) models for effective interaction. This research addresses the challenges of developing NLU models across multiple languages, including Estonian, English, German, Spanish, French, Italian, and Latvian. We explore various intent detection methodologies, including memory-based techniques that encompass both vectorization with Language-agnostic BERT Sentence Embedding (LaBSE), Advanced Data Analysis (ADA), or Sentence-level MultimOdal and LaNguage-Agnostic Representations (SONAR) models, and semantic search using cosine similarity or Levenshtein distance-based approaches. Additionally, we investigate supervised text classification methods such as FastText with the Convolutional Neural Network, LaBSE with Feed-Forward Neural Network, or fine-tuning LaBSE, as well as text generation techniques leveraging OpenAI’s Davinci large language model. Our findings highlight the efficacy of memory-based approaches, particularly for non-English languages. We showcase the effectiveness of multilingual and cross-lingual LaBSE vectorization and the SONAR large language model. Furthermore, we introduce open-source intent detection software tailored for Federated Learning (FL). Through a prototype, we demonstrate the seamless integration of this framework into RASA-based virtual assistants, offering practical guidance for organizations interested in deploying intelligent and privacy-preserving conversational agents. This research advances virtual assistant development and highlights the potential of FL for seamless integration with NLU models. In the future, we plan to test it with more languages and with real client scenarios.
External IDs:doi:10.1007/978-3-031-63543-4_6
Loading