CAPTAIN: Continuous Automated Planning Through Autonomous Internet Navigation

Published: 13 Dec 2024, Last Modified: 02 Mar 2025LM4PlanEveryoneRevisionsBibTeXCC0 1.0
Keywords: Web Automation, Large Language Models, Planning Systems, Natural Language Interfaces, Browser Automation, Task Decomposition, State Machine Architecture, Human-AI Interaction, Autonomous Systems, Web Navigation
TL;DR: CAPTAIN enables users to automate complex web tasks using natural language commands by combining LLM-driven planning with a robust state machine architecture to handle dynamic web environments
Abstract: Web automation has traditionally relied on brittle scripting approaches that demand technical expertise and lack adaptability to dynamic web environments. This paper introduces CAPTAIN (Continuous Automated Planning Through Autonomous INternet navigation), a novel system that bridges natural language understanding and web automation through sophisticated planning techniques. By leveraging Large Language Models (LLMs) for task decomposition and planning, CAPTAIN enables users to automate complex web tasks using natural language commands while maintaining reliability through a state machine-driven architecture. Our system implements three key innovations: (1) a modular action framework that decomposes web tasks into atomic operations with built-in error recovery, (2) a memory-augmented execution pipeline that maintains task context across multiple states, and (3) a hierarchical planning system that enables continuous adaptation to dynamic web environments. Through comprehensive evaluation across six representative web automation tasks, CAPTAIN demonstrates robust performance across multiple LLM configurations, from state-of-the-art models to smaller, more accessible variants. Our results show that effective web automation can be achieved through sophisticated planning frameworks that bridge the gap between natural language understanding and reliable task execution, while maintaining consistent performance across diverse web automation scenarios.
Submission Number: 41
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview