AutoRPA: Efficient GUI Automation through LLM-Driven Code Synthesis from Interactions

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 regularEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large Language Model (LLM) based agents have demonstrated proficiency in multi-step interactions with graphical user interfaces (GUIs). While most research focuses on improving single-task performance, practical scenarios often involve repetitive GUI tasks for which invoking LLM reasoning repeatedly, i.e., the ReAct paradigm, is inefficient. Prior to LLMs, traditional Robotic Process Automation (RPA) offers runtime efficiency but demands significant manual effort to develop and maintain. To bridge this gap, we propose \textbf{AutoRPA}, a framework that automatically distills the decision logic of ReAct-style agents into robust RPA functions. AutoRPA introduces two core innovations: (1) A \textit{translator-builder pipeline} where a translator agent converts hard-coded ReAct actions into soft-coded procedures, and a builder agent synthesizes robust RPA functions via retrieval-augmented generation over multiple trajectories; (2) A \textit{hybrid repair strategy} during code verification, combining RPA execution with ReAct-based fallback for iterative refinement. Experiments across multiple GUI environments demonstrate that RPA functions generated by AutoRPA successfully solve similar tasks while reducing token usage by 82\%\textasciitilde96\%, significantly improving runtime efficiency and reusability.
Lay Summary: Every day, millions of people perform repetitive tasks on computers and smartphones, such as filing workplace forms, extracting data, or booking flights. LLM agent can now complete these tasks by looking at screenshots and figuring out what to do step-by-step. However, having an LLM agent "think" through the exact same task every day is slow, inefficient, and highly expensive. On the other hand, classic automation software runs fast and cheap, but it requires human software engineers to manually write rigid scripts that break easily if a button moves or a layout updates slightly. To bridge this gap, we introduce AutoRPA, a framework that enables AI to automatically build its own reliable, low-cost automation programs. AutoRPA works like a smart observer: it lets a flexible AI agent solve a task once to demonstrate how it's done. Then, a "translator" AI turns specific actions (like clicking an exact spot on a screen) into description-based instructions (like finding a text field labeled "password," no matter where it is located). Finally, a "builder" AI compiles these instructions into clean, reusable code. If the newly generated code encounters an unexpected screen layout or a bug, the flexible AI steps back in to troubleshoot the issue at the exact point of failure and teaches the program how to fix itself. In tests across various computer and smartphone environments, AutoRPA successfully completed complex tasks while slashing AI computing costs (token usage) by 82% to 96%. By combining the high flexibility of modern AI with the speed and low cost of classic software automation, AutoRPA makes digital tasks faster, cheaper, and completely hands-free.
Originally Submitted Supplementary Material: zip
Primary Area: Deep Learning->Large Language Models
Keywords: LLM Agent, GUI Agent, GUI Automation, RPA
Originally Submitted PDF: pdf
Submission Number: 6852
Loading