Rate-Optimal Online Learning for Dynamic Assortment Selection with Positioning

Published: 10 Aug 2025, Last Modified: 07 May 2026Operations ResearchEveryoneCC BY 4.0
Abstract: In online retailing, the seller aims to offer assortment of items with maximized revenue. We introduce a new online learning problem called dynamic assortment selection with positioning (DAP) that additionally learns the optimal positioning within the assortment. Specifically, the customers make purchases based on the item attractiveness as the product of the position effect and unknown preference parameter through a multinomial logit choice model. We first demonstrate that any assortment-only algorithm that neglects position effects results in linear regrets. To address this gap, we propose the truncated linear regression upper confidence bound (TLR-UCB) policy. TLR-UCB utilizes a novel geometric linear bandit–type feedback structure for UCB construction under random and adaptive position effects. In addition, TLR-UCB conducts well-designed truncations before applying linear regression to handle conditional geometric responses. In theory, we establish a regret upper bound of O˜ (T1=2) for TLR-UCB, matching our derived Ω(T1=2) lower bound. Moreover, we develop an explore-in-TLR-UCB (EI-TLR) policy to tackle unknown position effects. It first conducts a joint learning procedure to estimate unknown preferences and position effects, and then implements a generalized TLR-UCB procedure driven by estimated position effects. Extensive experiments demonstrate the superior performance of TLR-UCB and EI-TLR over other benchmark policies.
Loading