Do as I Say, Not as I Do: The Role of LLMs in Open World Agents

William Ferguson; Marshall Brinn; Allyson Beach; Aaron Adler

Do as I Say, Not as I Do: The Role of LLMs in Open World Agents

William Ferguson, Marshall Brinn, Allyson Beach, Aaron Adler

Published: 17 Sept 2025, Last Modified: 06 Nov 2025ACS 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: agent, agentic, LLM, plan execution, cognitive architecture, hybrid representation

TL;DR: LLMs can't keep track of what they are doing so we have to find the right role for them in agents.

Abstract: What is the right role for LLMs in agents? We describe here results and observations from efforts to use Large Language Models (LLMs) as integral components of an agent architecture to be deployed in an open environment and guided by a human operator. Our agent starts with high-level tasking and the procedural knowledge to execute that tasking under normal circumstances. Our baseline agent is a single, prompted LLM as the agent. In this role we find critical (and inherent?) limitations in the LLM context memory model: LLMs as agents can’t consistently track plan execution state nor can they track changing world state. We do, however, find central roles in agent architectures where an LLM can provide significant value during procedure execution – updating the procedure; noticing, characterizing and adapting to anomalies; testing for conditions and action success; etc. We advocate for an architectural approach where programed components scaffold the execution state of the agent and help with context tracking, but where LLMs do most of the “reasoning”. To both scaffold and exploit LLMs effectively, we also advocate a hybrid knowledge representation made up of formal frames or schema, many of whose slots are filled with natural language instead of symbolic structure. We also speculate on then reasons that the limitations (and strengths) of LLMs may be inherent to the transformer architecture and the nature of LLM unsupervised training. We conclude that a likely role for LLMs in agentic systems is at the heart of each reasoning process in the agent but not as a stand-alone agents.

Paper Track: Technical paper

Submission Number: 35

Loading