Keywords: Large Language Models, Bayesian Experimental Design, Sequential Information Gathering, Expected Information Gain, Multi-Turn Reasoning, Group Relative Policy Optimisation
TL;DR: This paper introduces ASIG, a fine-tuning framework that amortises Bayesian experimental design into LLM policies for efficient sequential information gathering.
Abstract: Large language models (LLMs) exhibit strong reasoning and world-knowledge capabilities, yet often struggle to gather information effectively across the multi-turn interactions required in sequential decision-making settings. We introduce Amortised Sequential Information Gathering (ASIG), a fine-tuning approach that amortises Bayesian Experimental Design (BED) into LLM policies via a multi-turn extension of Group Relative Policy Optimisation with an Expected Information Gain reward. Evaluated on the 20 Questions task, ASIG more than doubles the success rate of the 7B base model and reduces inference cost by over $25\times$ relative to BED-LLM, a competitive inference-time baseline. Applied to MediQ, a medical diagnosis benchmark unseen during training, ASIG improves information-seeking performance at the 7B scale, suggesting that the learned strategies can transfer out of distribution. Our findings show that amortising BED into LLM policies provides an effective and computationally efficient approach to sequential information gathering.
Submission Number: 225
Loading