Keywords: reinforcement learning, contextual bandits, off-policy RL
Abstract: In this work, we describe practical lessons we have learned from
successfully using contextual bandits (CBs) to improve key business
metrics of the Microsoft Virtual Agent for customer support. While
our current use cases focus on single step reinforcement learning (RL)
and mostly in the domain of natural language processing and
information retrieval we believe many of our findings are
generally applicable. Through this article, we highlight certain
issues that RL practitioners may encounter in similar types of
applications as well as offer practical solutions to these
challenges.
3 Replies
Loading