AliExpress Learning-to-Rank: Maximizing Online Model Performance Without Going Online

Guangda Huzhang, Zhen-Jia Pang, Yongqing Gao, Yawen Liu, Weijie Shen, Wen-Ji Zhou, Qianying Lin, Qing Da, Anxiang Zeng, Han Yu, Yang Yu, Zhi-Hua Zhou

Published: 01 Jan 2023, Last Modified: 12 May 2023IEEE Trans. Knowl. Data Eng. 2023Readers: Everyone

Abstract: Learning-to-rank (LTR) has become a key technology in E-commerce applications. Most existing LTR approaches follow a supervised learning paradigm with data collected from an online system. Yet, LTR models sometimes have good performance on the offline validation set but poor performance with online metrics, suggesting an inconsistency exists between offline and online evaluation measurements. We confirm that this inconsistency exists in AliExpress Search, the search engine for an international E-commerce business. One major reason for the inconsistency is the ignorance of the item context, as the item order of the newly served model is always different from that in the offline dataset. This paper proposes an evaluator-generator framework for E-commerce LTR with the item context. The framework consists of an evaluator that generalizes to evaluate recommendations involving the context, a generator that maximizes the score of the evaluator with reinforcement learning, and an adversarially-trained discriminator that ensures the reliable explorations of the generator. Extensive experiments in the simulation environment and AliExpress Search show that, first, the classic data-based metrics on the offline dataset has an obvious inconsistency with online performance, and can even be misleading. Second, the proposed evaluator score is significantly more consistent with the online performance than common ranking metrics. As a result, our method achieves a significant improvement (> <inline-formula><tex-math notation="LaTeX">$2\%$</tex-math></inline-formula> ) in terms of Conversion Rate (CR) over the industrial-level fine-tuned model in online A/B tests.

0 Replies