On Overcoming Miscalibrated Conversational Priors in LLM-based ChatBots

Published: 26 Apr 2024, Last Modified: 15 Jul 2024UAI 2024 posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: preference elicitation, cost-aware, under-specification, large language model, LLM copilot
TL;DR: We frame conversational recommendation with query under-specification as a Partially Observed Decision Process, and propose a prompt-based intervention framework atop LLMs to encourage cost- and context-aware uncertainty reduction when warranted.
Abstract: We explore the use of Large Language Model (LLM-based) chatbots to power recommender systems. We observe that the chatbots respond poorly when they encounter under-specified requests (e.g., they make incorrect assumptions, hedge with a long response, or refuse to answer). We conjecture that such miscalibrated response tendencies (i.e., conversational priors) can be attributed to LLM fine-tuning by annotators --- single-turn annotations may not capture multi-turn conversation utility, and the annotators' preferences may not even be representative of users interacting with a recommender system. We first analyze public LLM chat logs to conclude that query under-specification is common. Next, we study synthetic recommendation problems with known but latent item utilities, and frame them as Partially Observed Decision Processes (PODP). We find that pre-trained LLMs can be sub-optimal for PODPs and derive better policies that clarify under-specified queries when appropriate. Then, we re-calibrate LLMs by prompting them with learned control messages to approximate the improved policy. Finally, we show empirically that our lightweight learning approach effectively uses logged conversation data to re-calibrate the response strategies of LLM-based chatbots for recommendation tasks.
List Of Authors: Herlihy, Christine and Neville, Jennifer and Schnabel, Tobias and Swaminathan, Adith
Latex Source Code: zip
Signed License Agreement: pdf
Submission Number: 613
Loading