Optimizing Chatbot Fallback Intent Selections with Reinforcement Learning

Jeremy Curuksu

Optimizing Chatbot Fallback Intent Selections with Reinforcement Learning

Jeremy Curuksu

Published: 29 Jun 2023, Last Modified: 04 Oct 2023MFPL PosterEveryoneRevisionsBibTeX

Keywords: Reinforcement Learning, Chatbot, Uncertainty calibration, Multi-agent RL

TL;DR: This paper shows that reinforcement learning can be used to assess the validity of a chatbot’s answers i.e., to learn to fall back on asking for clarifications when needed, by adapting to semantic pitfalls of a language model in a given environment.

Abstract: Large language models used in GPT-4 and Alexa are limited by their ability to assess the validity of their own answers i.e., to fall back on a clarification intent when needed. Reinforcement learning can be used specifically to address this fallback selection problem, by adapting to semantic pitfalls of a given language model in a given environment. This is demonstrated in a simplified environment where the chatbot learns when best to ask for clarifications. After training it identifies correct intents in $<$ 2 interactions on average in over 99% of dialogues. In multi-agent simulations where the user cooperates, the chatbot identifies correct intents in 1.3 interactions on average in 100% of dialogues.

Submission Number: 15

Loading