Boolean interpretation, matching, and ranking of natural language queries in product selection systems

Matthew Moulton, Yiu-Kai Ng

Published: 2024, Last Modified: 23 Jan 2026Discov. Comput. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: E-commerce is a massive sector in the US economy, generating $767.7 billion in revenue in 2021. E-commerce sites maximize their revenue by helping customers find, examine, and purchase products. To help users easily find the most relevant products in the database for their individual needs, e-commerce sites are equipped with a product retrieval system. Many of these modern retrieval systems parse user-specified constraints or keywords embedded in a simple natural language query, which is generally easier and faster for the customer to specify their needs than navigating a product specification form, and does not require the seller to design or develop such a form. These natural language product retrieval systems, however, suffer from low relevance in retrieved products, especially for complex constraints specified on products. The reduced accuracy is in part due to under-utilizing the rich semantics of natural language, specifically queries that include Boolean operators, and lacking of the ranking on partially-matched relevant results that could be of interest to the customers. This undesirable effect costs e-commerce vendors to lose sales on their merchandise. In solving this problem, we propose a novel product retrieval system, called ${\textit{QuePR}}$, that parses arbitrarily simple and complex natural language queries with(out) Boolean operators, utilizes combinatorial numeric and content-based matching to extract relevant products from a database, and ranks retrieved resultant products by relevance before presenting them to the end-user. The advantages of ${\textit{QuePR}}$ are its ability to process explicit and implicit Boolean operators in queries, handle natural language queries using similarity measures on partially-matched records, and perform best guess or match on ambiguous or incomplete queries. ${\textit{QuePR}}$ is unique, easy to use, and scalable to all product categories. To verify the accuracy of ${\textit{QuePR}}$ in retrieving relevant products on different product domains, we have conducted different performance analyses and compared ${\textit{QuePR}}$ with other ranking and retrieval systems. The empirical results verify that ${\textit{QuePR}}$ outperforms others while maintaining an optimal runtime speed.

External IDs:dblp:journals/ir/MoultonN24