Learning to price ancillary seats with Bayesian Value Iteration

Published: 19 Nov 2025, Last Modified: 28 Jan 2026Agifors Annual SymposiumEveryoneCC BY 4.0
Abstract: We study airline ancillary seat price optimization as a contextual multi-armed bandit, where context (flight-type/itinerary/time-to-departure) informs price selection and the policy “revenue-manages” the full seat inventory over the booking window. We model demand with a Poisson GLM and treat unknown elasticities within a Bayesian belief-MDP. On small problem instances, we compute the optimal policy by value iteration, balancing exploration and revenue exactly. To scale to large problem instances, we approximate the value function with a dual-stream deep learning network that separates arm uncertainty from contextual effects and fuses them into a single value estimate. Across realistic simulations, the approach increases revenue and reduces regret versus LinUCB/LinTS/Tree-UCB benchmarks, while preserving fast decision time. We discuss sensitivity to priors/price grids and integration with inventory and booking-window constraints.
Loading