% \section{Conclusion \& Future Work}
\section{Conclusion}
This paper explores how user underspecification affects the behavior of LLM-based chatbots that are fine-tuned with human feedback. We show that chatbots have difficulty handling vague user requests and explain how this issue stems from the annotation process of LLMs. Our study of a public chat logs dataset confirms that this problem is common -- over 25\% of the queries are highly underspecified. We formulate the problem of underspecification as a partially observable decision process (PODP) and generate synthetic data from a recommendation scenario with hidden item values for experimental evaluation. Our experiments show that pre-trained LLMs perform poorly on underspecified user queries and propose a method to adjust LLMs through prompting (with learned control messages). We demonstrate that our lightweight learning method can effectively leverage previous conversation data to improve the response behavior of LLM-based chatbots for recommendation tasks. 

% \textcolor{mangeta}{add future work stuff related to validating our helper LLM classifications and assumptions regarding user response behavior, and actual preferences with real humans}

% We view validation of our approach with human users as a valuable direction for future work.


% future work ideas: 

% algorithmic approach that is able to incorporate an individual user's (persisted over time, though possibly not statically so), accounting for heterogeneity wrt tolerance for questions (as opposed to direct response), perhaps conditioned on intent or intent domain.

% addtl analysis comparing the results of llm "introspection"/self-reported uncertainty with probability distribution over tokens

% applicability beyond recommendation/RAG ~ other domain-specific tasks featuring text-based inputs/text as data, etc. where we have an action space that can also be parameterized by sys messages ~ e.g., clinical  decision-support where inputs include clinical notes

% other options for teaching the LLM the states in which shorter trees featuring addtl uncertainty reduction at the onset should be preferred over longer trees containing "direct response in the face of uncertainty" followed by backtracking (i.e., the earlier counterfactual trees idea)

% empirical eval of proposed approaches w/real-world users to validate observed findings/the extent to which our findings generalize to different user subgroups



