Keywords: multinomial logistic choice model, contextual bandits, diversity, regret analysis
TL;DR: We propose the diverse multinomial logit choice model, which directly incorporates assortment diversity, and design a computationally efficient algorithm that operates effectively under this model.
Abstract: Existing contextual multinomial logit (MNL) bandits model relevance-driven choice but ignore the potential benefits of within-assortment diversity, while submodular/combinatorial bandits encode diversity in rewards but lack structured choice probabilities. We bridge this gap with the *diversified multinomial logit* (DMNL) contextual bandit, which augments MNL choice probabilities with a generally submodular diversity function, thereby formalizing the relevance—diversity trade-off within a single model.
Incorporating diversity renders exact MNL assortment optimization intractable. We propose a *white-box* UCB-based algorithm, `OFU-DMNL`, that constructs assortments item-wise by maximizing optimistic marginal gains, avoids black-box optimization oracles, and provides end-to-end guarantees.
We show that `OFU-DMNL` achieves at least a $(1-\tfrac{1}{e+1})$-*approximate* regret bound $\tilde{O}\big(d \sqrt{T/K}\big)$, where $d$ is the context dimension, $K$ the maximum assortment size, and $T$ the horizon, and attains an improved approximation factor over standard submodular baselines.
Experiments demonstrate consistent gains and, relative to exhaustive enumeration, comparable regret with substantially lower runtime. Overall, DMNL bandits provide a principled and practical foundation for diversity-aware assortment optimization under uncertainty, and `OFU-DMNL` offers a statistically and computationally efficient solution.
Supplementary Material: zip
Primary Area: learning theory
Submission Number: 9827
Loading