Abstract: Wearable devices allow collecting data at an individual level, which can be used to propose an unseen degree of personalization for a broad domain of applications. For instance, we focus on electrochromic frames that allow to manually change the lens’ tint, or automatically, based on an ambient light sensor. We aim to use the user’s interactions with his frame to adapt this automatic mode to better consider his preferences. From a technical standpoint, this is a difficult task, as prediction and estimation cannot be done separately. That is why we approach this industrial problem from a reinforcement learning perspective: a policy must control the tint class in such a way that the number of user interactions is minimized. A particularity of this problem is that there is an inherent notion of order between the finite proposed tint classes, as some are darker than others. The usual Boltzmann parametrization does not account for this. Thus, we develop and implement policy gradient methods for ordinal policies. Using a simulation setting, we show that ignoring the ordinal structure of the response variables yields a suboptimal strategy. Additionally, we tested this technique with real users in controlled conditions; as the tint-control mode updated, the number of user interactions decreased. At last, using ordinal policies can be adapted to a deep reinforcement learning context, solving classic problems with continuous actions using discretization of this space.
Loading