Embeddings Might Be All You Need: Domain-Specific Sentence Encoders for Latin American E-Commerce Questions
Abstract: In Latin American e-commerce, customer inquiries often exhibit unique linguistic patterns that require specialized handling for accurate responses. Traditional sentence encoders may struggle with these regional nuances, leading to less effective answers. This study examines the use of fine-tuned transformer models to generate domain-specific sentence embeddings, specifically for Portuguese and Spanish retrieval tasks. Our findings show that these specialized embeddings significantly outperform general-purpose pretrained models and traditional techniques like BM-25, eliminating the need for additional re-ranking steps in retrieval processes. Our results explore the effects of multi-objective training within Matryoshka Representation Learning, highlighting its effectiveness in maintaining retrieval effectiveness across various embedding dimensions. Our approach offers a scalable and efficient solution for multilingual retrieval in e-commerce, reducing computational costs while ensuring high accuracy.
Paper Type: Long
Research Area: Information Retrieval and Text Mining
Research Area Keywords: representation learning, dense retrieval, e-commerce applications
Contribution Types: NLP engineering experiment, Approaches to low-resource settings, Data resources
Languages Studied: Spanish, Portuguese
Submission Number: 7979
Loading