Cross-lingual Search for e-Commerce based on Query Translatability and Mixed-Domain Fine-Tuning

Jesus Perez-Martin, Jorge Gomez-Robles, Asier Gutiérrez-Fandiño, Pankaj Adsul, Sravanthi Rajanala, Leonardo Lezcano

Published: 01 Jan 2023, Last Modified: 16 Jun 2023WWW (Companion Volume) 2023Readers: Everyone

Abstract: Online stores in the US offer a unique scenario for Cross-Lingual Information Retrieval (CLIR) due to the mix of Spanish and English in user queries. Machine Translation (MT) provides an opportunity to lift relevance by translating the Spanish queries to English before delivering them to the search engine. However, polysemy-derived problems, high latency and context scarcity in product search, make generic MT an impractical solution. The wide diversity of products in marketplaces injects non-translatable entities, loanwords, ambiguous morphemes, cross-language ambiguity and a variety of Spanish dialects in the communication between buyers and sellers, posing a thread to the accuracy of MT. In this work, we leverage domain adaptation on a simplified architecture of Neural Machine Translation (NMT) to make both latency and accuracy suitable for e-commerce search. Our NMT model is fine-tuned on a mixed-domain corpus based on engagement data expanded with catalog back-translation techniques. Beyond accuracy, and given that translation is not the goal but the means to relevant results, the problem of Query Translatability is addressed by a classifier on whether the translation should be automatic or explicitly requested. We assembled these models into a query translation system that we tested and launched at Walmart.com , with a statistically significant lift in Spanish GMV and an nDCG gain for Spanish queries of +70%.

0 Replies