Can Visual Scratchpads With Diagrammatic Abstractions Augment LLM Reasoning?

Published: 27 Oct 2023, Last Modified: 24 Apr 2024ICBINB 2023EveryoneRevisionsBibTeX
Keywords: large language models, visual foundation models, diagrammatic reasoning
TL;DR: Analyses of a framework that augments LLM reasoning with a visual scratchpad and visual foundation model.
Abstract: When humans reason about complex text-based questions, we leverage diagrammatic abstractions drawn on a visual scratchpad. In this paper, we introduce and explore the capabilities of Visual-Scratchpad, a method that augments a *large language foundation model* (LLM) with diagrammatic execution and readout. We enable the LLM to generate drawing commands and to readout abstractions from the resulting picture. The visual readout operation uses a *visual foundation model*, optionally finetuned with expert iteration. Here, we show that although Visual-Scratchpad outperforms an inference-only LLM, it surprisingly yields worse performance compared to a single finetuned LLM. Through experiments, we propose that this gap is due to the failure mode of vision foundation models in understanding abstractions in diagrams.
Submission Number: 6