Abstract: Most modern Web search engines implement query auto-completion (QAC) to facilitate faster user query input by predicting users' intended query. This is the case of Jusbrasil, Brazil’s most prominent and widely used legal search engine platform. Query auto-completion is typically performed in two steps: matching and ranking. Matching refers to the selection of candidate query from a suggestions dataset. Ranking sorts the matching results according to a score function that attempts to select the top most relevant suggestions for the user. In this paper, our main goal is to explore the effectiveness of learning to rank algorithms on the ranking step for query auto-completion in the legal domain. In particular, we explore four learning to rank algorithms: LambdaMART, XGBoost, RankSVM and Genetic Programming. LambdaMART is widely used in query auto-completion. On the other hand, as far as we know, this is the first time that the RankSVM and XGBoost are used for this task. Additionally, we propose the use of Genetic Programming as a lightweight and viable alternative for query auto-completion. One difficulty for exploring learning to rank algorithms in query auto-completion is the lack of fine-grained training and test datasets, since learning to rank algorithms rely on a large number of features. To bridge this gap, and also to foster research on this area, we propose two datasets with different types of features for query auto-completion in the legal domain. The datasets were created by collecting data from several data sources from Jusbrasil, including contextual features from search query logs, enriched with additional features extracted from other data sources like auto-completion log, document content and metadata available at Jusbrasil. Then, we show that learning to rank is effective for query auto-completion in the legal domain by answering four main research questions: 1) How each feature, specially the novel ones proposed in our work, impact the rankings in query auto-completion?; 2) How effective is learning to rank with respect to the Most Popular Completion (MPC), a ranking algorithm widely adopted as baseline in the literature?; 3) Among the four alternatives experimented, which learning to rank algorithm is more effective in the legal domain?; and 4) How effective is learning to rank with respect to ranking models based on BERT and ColBERT? Finally, we conduct an online A/B test at Jusbrasil.
External IDs:dblp:journals/jbcs/DominguesRMS25
Loading