Abstract: Accurate query-product relevance labeling is indispensable to generate ground truth dataset for search ranking in e-commerce. Traditional approaches for annotating query-product pairs rely on
human-based labeling services, which is expensive, time-consuming
and prone to errors. In this work, we explore the application of
Large Language Models (LLMs) to automate query-product relevance labeling for large-scale e-commerce search. We use several
publicly available and proprietary LLMs for this task, and conducted experiments on two open-source datasets and an in-house
e-commerce search dataset. Using prompt engineering techniques
such as Chain-of-Thought (CoT) prompting, In-context Learning
(ICL), and Retrieval Augmented Generation (RAG) with Maximum
Marginal Relevance (MMR), we show that LLM’s performance has
the potential to approach human-level accuracy on this task in a
fraction of the time and cost required by human-labelers, thereby
suggesting that our approach is more efficient than the conventional
methods. We have generated query-product relevance labels using
LLMs at scale, and are using them for evaluating improvements
to our search algorithms. Our work demonstrates the potential
of LLMs to improve query-product relevance thus enhancing ecommerce search user experience. More importantly, this scalable
alternative to human-annotation has significant implications for
information retrieval domains including search and recommendation systems, where relevance scoring is crucial for optimizing the
ranking of products and content to improve customer engagement
and other conversion metrics.
Loading