Efficient Algorithms for Contextual Apple Tasting with Log-Loss

Published: 25 May 2026, Last Modified: 27 May 2026DEMO 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: contextual bandit, apple tasting, llm cascading
Abstract: This paper introduces two novel algorithmic approaches designed for the Contextual Apple Tasting problem, where the learner faces an asymmetric feedback structure by observing binary labels only upon an `Accept' action. To address the inherent decision bias and exploration challenges, we propose two distinct but complementary strategies. First, we introduce LogCBPSide-AT, an algorithm leveraging Confidence Bounds for Partial monitoring (CBP) to explicitly quantify predictive uncertainty and effectively balance the exploration-exploitation trade-off. Second, we present LogCB-AT, an approach that reduces the apple tasting problem to an online regression oracle. This reduction-based strategy offers a computationally efficient and scalable alternative that fundamentally bypasses the complex, often intractable confidence bound constructions required by traditional methods. Theoretically, we prove that both algorithms achieve sublinear regret bounds for losses associated with the binary labels, guaranteeing robust performance even under fundamentally restricted feedback. The practical utility of our methods is empirically validated through adaptive Large Language Model (LLM) cascading, where they effectively optimize the trade-off between inference cost and response accuracy.
Submission Number: 141
Loading