Keywords: contextual bandits, online learning, partial feedback, non-stationarity, cost-sensitive decision-making, adversarial robustness
Abstract: Public agencies like city governments face sequential, cost-sensitive choices under partial feedback, for example, deciding whether to \emph{inspect} or \emph{not inspect} a construction permit given categorical descriptors, spatial coordinates, and stage metadata. There are operational costs of doing inspections and thus doing all possible inspections is excessively costly. We frame this as a contextual bandit problem and ask: Can regret-minimizing online policies reduce cumulative cost without much hyperparameter tuning when
conditions drift or become strategic?
Submission Number: 418
Loading