Eliciting Behaviors in Multi-Turn Conversations

ICLR 2026 Conference Submission22182 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: behavior elicitation, multi-turn conversation, LLM evaluation
TL;DR: We study behavior elicitation in multi-turn conversations and find online interaction methods are most query efficient.
Abstract: Identifying specific, and often complex, behaviors from large language models (LLMs) in conversational settings is crucial for their evaluation. Recent work proposes novel techniques to find natural language prompts that induce specific behaviors from a target model, yet they are mainly studied in single-turn settings. In this work, we study behavior elicitation in the context of multi-turn conversations. We first offer an analytical framework that categorizes existing methods into three families based on their interactions with the target model: those that use only prior knowledge, those that use offline interactions, and those that learn from online interactions. We then propose a multi-turn extension of the online method. We evaluate all three families of methods on the task of generating test cases for multi-turn behavior elicitation. We investigate the efficiency of these approaches by analyzing the trade-off between the query budget, i.e., the number of interactions with the target model, and the success rate, i.e., the discovery rate of behavior-eliciting inputs. We find that online methods can achieve 20-60% success rate with just a few thousand queries over three tasks where static methods used in existing multi-turn conversation benchmarks fail to find any failure case. Our work highlights a novel application of behavior elicitation methods in multi-turn conversation evaluation and the need for the community to move towards dynamic benchmarks.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 22182
Loading