Abstract: We demonstrate the ability of large language
models (LLMs) to perform iterative self-improvement of robot
policies. An important insight of this paper is that LLMs have
a built-in ability to perform (stochastic) numerical optimization
and that this property can be leveraged for explainable robot
policy search. Based on this insight, we introduce the SAS
Prompt (Summarize, Analyze, Synthesize) – a single prompt
that enables iterative learning and adaptation of robot behavior
by combining the LLM’s ability to retrieve, reason and optimize
over previous robot traces in order to synthesize new, unseen
behavior. Our approach can be regarded as an early example
of a new family of explainable policy search methods that
are entirely implemented within an LLM. We evaluate our
approach both in simulation and on a real-robot table tennis
task. Project website: sites.google.com/asu.edu/sas-llm/
Loading