CAPTAIN: Continuous Automated Planning Through Autonomous Internet Navigation

Adithya S Kolavi

CAPTAIN: Continuous Automated Planning Through Autonomous Internet Navigation

Adithya S Kolavi

Published: 13 Dec 2024, Last Modified: 02 Mar 2025LM4PlanEveryoneRevisionsBibTeXCC0 1.0

Keywords: Web Automation, Large Language Models, Planning Systems, Natural Language Interfaces, Browser Automation, Task Decomposition, State Machine Architecture, Human-AI Interaction, Autonomous Systems, Web Navigation

TL;DR: CAPTAIN enables users to automate complex web tasks using natural language commands by combining LLM-driven planning with a robust state machine architecture to handle dynamic web environments

Abstract: Web automation has traditionally relied on brittle scripting approaches that demand technical expertise and lack adaptability to dynamic web environments. This paper introduces CAPTAIN (Continuous Automated Planning Through Autonomous INternet navigation), a novel system that bridges natural language understanding and web automation through sophisticated planning techniques. By leveraging Large Language Models (LLMs) for task decomposition and planning, CAPTAIN enables users to automate complex web tasks using natural language commands while maintaining reliability through a state machine-driven architecture. Our system implements three key innovations: (1) a modular action framework that decomposes web tasks into atomic operations with built-in error recovery, (2) a memory-augmented execution pipeline that maintains task context across multiple states, and (3) a hierarchical planning system that enables continuous adaptation to dynamic web environments. Through comprehensive evaluation across six representative web automation tasks, CAPTAIN demonstrates robust performance across multiple LLM configurations, from state-of-the-art models to smaller, more accessible variants. Our results show that effective web automation can be achieved through sophisticated planning frameworks that bridge the gap between natural language understanding and reliable task execution, while maintaining consistent performance across diverse web automation scenarios.

Submission Number: 41

Loading