The Nah Bandit: Modeling User Noncompliance in Recommendation Systems

Tianyue Zhou, Jung-Hoon Cho, Cathy Wu

Published: 01 Dec 2025, Last Modified: 22 Jan 2026IEEE Transactions on Control of Network SystemsEveryoneRevisionsCC BY-SA 4.0

Abstract: Recommendation systems now pervade the digital world, ranging from advertising to entertainment. However, it remains challenging to implement effective recommendation systems in the physical world, such as in mobility or health. This work focuses on a key challenge: in the physical world, it is often easy for users to opt out of taking any recommendations if they are not to their liking, and to fall back to their baseline behavior. It is thus crucial in cyber-physical recommendation systems to operate with an interaction model that is aware of such user behavior, or else the user may abandon the recommendations altogether. This article introduces Nah Bandit, a tongue-in-cheek reference to describe a Bandit problem where users can say “nah” to the recommendation and opt for their preferred option instead. As such, this problem lies in between a typical bandit setup and supervised learning. We model user noncompliance by parameterizing an anchoring effect of recommendations on users. We then propose the expert with clustering (EWC) algorithm, a hierarchical approach that incorporates feedback from both recommended and nonrecommended options to accelerate user preference learning. In a recommendation scenario with $N$ users, $T$ rounds per user, and $K$ clusters, EWC achieves a regret bound of $O(N\sqrt{T\log K} + NT)$, achieving superior theoretical performance in the short term compared to the LinUCB algorithm. Moreover, we show that this bound decreases further as the user compliance rate increases. Experimental results also highlight that EWC outperforms both supervised learning and traditional contextual bandit approaches. This advancement reveals that effective use of noncompliance feedback can accelerate preference learning and improve recommendation accuracy. This work lays the foundation for future research on the Nah Bandit, providing a robust framework for more effective recommendation systems.

External IDs:doi:10.1109/tcns.2025.3600814