Keywords: choice models, multinomial logit, delayed feedback
Abstract: Choice models are essential for understanding decision-making processes in domains like online advertising, product recommendations, and assortment optimization. The Multinomial Logit (MNL) model is particularly versatile in selecting products or advertisements for display. However, challenges arise with unknown MNL parameters and delayed feedback, requiring sellers to learn customers’ choice behavior and make dynamic decisions with biased knowledge due to delays. We address these challenges by developing an algorithm that handles delayed feedback, balancing exploration and exploitation using confidence bounds and optimism. We first consider a censored setting where a threshold for considering feedback is imposed by business requirements. Our algorithm demonstrates a $\tilde{O}(\sqrt{NT})$ regret, with a matching lower bound up to a logarithmic term. Furthermore, we extend our analysis to environments with non-thresholded delays, achieving a $\tilde{O}(\sqrt{NT})$ regret. To validate our approach, we conduct experiments that confirm the effectiveness of our algorithm.
Supplementary Material: zip
Primary Area: Online learning
Submission Number: 16291
Loading