A Llama Sunk My Battleship! Asking Rational Questions with LLMs via Bayesian Inference

Gabriel Grand; Valerio Pepe; Jacob Andreas; Joshua B. Tenenbaum

A Llama Sunk My Battleship! Asking Rational Questions with LLMs via Bayesian Inference

Gabriel Grand, Valerio Pepe, Jacob Andreas, Joshua B. Tenenbaum

Published: 10 Oct 2024, Last Modified: 29 Oct 2024Sys2-Reasoning PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Question-asking, reasoning, informativity, grounding, Bayesian inference, Monte Carlo search, mental computation, probabilistic programming, world models, resource rationality, human cognition.

TL;DR: We introduce a Bayesian model that combines a LLM-driven prior with a probabilistic world model to generate coherent questions in a grounded information-seeking task based on Battleship.

Abstract: One of the hallmarks of an intelligent agent is the ability to ask good questions. While facility with language is clearly a prerequisite, even in simple settings, LLMs can struggle to come up with questions that yield useful information---suggesting a failure of grounded reasoning. We study this phenomenon in a question-asking task based on the classic board game Battleship, where both text-only and multimodal LLMs perform far below human baselines. We propose a Bayesian model that combines a LLM-driven prior over questions with a probabilistic world model to facilitate coherent reasoning. We find that with a surprisingly modest sample budget for “mental computation,” our method is well-calibrated to human performance across varied Battleship board scenarios. Notably, this approach allows much smaller LLMs, such as CodeLlama-7b, to perform on par with GPT-4. These results support the emerging trend toward test-time inference as a scaling route for LLM reasoning, while highlighting the utility of probabilistic world models for grounding and structuring such computations.

Submission Number: 31

Loading