Keywords: llm, agent, code actions, code generation
TL;DR: We propose a new language for LLM agents to use for actions, and we show its benefits over Python in terms of performance, reliability, and security.
Abstract: Modern large language models (LLMs) are often deployed as agents, calling external tools adaptively to solve tasks. Rather than directly calling tools, it can be more effective for LLMs to write code to perform the tool calls, enabling them to automatically generate complex control flow such as conditionals and loops. Such code actions are typically provided as Python code, since LLMs are quite proficient at it; however, Python may not be the ideal language due to limited built-in support for performance, security, and reliability. We propose a novel programming language for code actions, called QUASAR, which has several benefits: (1) automated parallelization to improve performance, (2) uncertainty quantification to improve reliability and mitigate hallucinations, and (3) security features enabling the user to validate actions. LLMs can write code in a subset of Python, which is automatically transpiled to QUASAR. We evaluate our approach on the ViperGPT and CaMeL agents, applied to the GQA visual question answering and AgentDojo AI assistant datasets, demonstrating that LLMs with QUASAR actions instead of Python actions retain strong performance, while reducing execution time by up to 56%, improving security by reducing user approvals by up to 53%, and improving reliability by applying conformal prediction to achieve a desired target coverage level.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 14018
Loading