Proposer-Agent-Evaluator (PAE): Autonomous Skill Discovery For Foundation Model Internet Agents

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We proposer PAE, a framework that automatically discovers and learns web navigation skills without manual instruction
Abstract: A generalist foundation model agent needs to have a large and diverse skill repertoire, such as finding directions between two travel locations and buying specific items from the Internet. If each skill needs to be specified manually through a fixed set of human-annotated instructions, the agent’s skill repertoire will necessarily be limited due to the scalability of human-annotated instructions. In this work, we address this challenge by proposing Proposer-Agent-Evaluator (PAE), an effective learning system that enables foundation model agents to autonomously discover and practice skills in the wild. After a context-aware task proposer generates instructions based on website information, the agent policy attempts those tasks in the real world with resulting trajectories evaluated by an autonomous VLM-based success evaluator. The success evaluation serves as the reward signal for the agent to refine its policies through RL. We validate PAE on challenging vision-based web navigation, using both real-world and selfhosted websites from WebVoyager and WebArena. Our results show that PAE significantly improves the zero-shot generalization capability of VLM Internet agents (around 50% relative improvement) to both unseen tasks and websites.
Lay Summary: We want AI agents, such as virtual assistants, to handle many tasks like navigating websites or shopping online without needing detailed instructions from humans each time. But manually writing instructions for every task isn't scalable. To solve this, we created a method called Proposer-Agent-Evaluator (PAE), which allows AI agents to learn new skills autonomously and achieve self-improvements. In PAE, one component suggests tasks based on information from websites. Another component, the agent itself, attempts these tasks on real websites. A third component evaluates how successful the agent was using visual feedback, guiding the agent to improve its performance automatically through trial and error. We tested PAE on challenging website navigation tasks, including both real and simulated environments. Our experiments showed that agents trained with PAE could handle completely new tasks and websites significantly better (by about 50%) than previous methods, highlighting the potential of AI agents to autonomously acquire new skills and self-improve.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Link To Code: https://github.com/amazon-science/PAE
Primary Area: Deep Learning->Large Language Models
Keywords: Web agents, LLM Agents, Autonomous skill discovery
Submission Number: 7653
Loading