Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents

Yaxin Luo; Zhaoyi Li; Jiacheng Liu; Jiacheng Cui; Xiaohan Zhao; Zhiqiang Shen

Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents

Yaxin Luo, Zhaoyi Li, Jiacheng Liu, Jiacheng Cui, Xiaohan Zhao, Zhiqiang Shen

Published: 18 Sept 2025, Last Modified: 30 Oct 2025NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: MLLM Agents, Multimodal Agent Benchmark, CAPTCHAs

TL;DR: We present Open CaptchaWorld, a benchmark that tests multimodal LLM agents on solving real-world CAPTCHAs via multi-step reasoning and interaction, revealing large gaps between current models and human performance.

Abstract: CAPTCHAs have been a critical bottleneck for deploying web agents in real-world applications, often blocking them from completing end-to-end automation tasks. While modern multimodal LLM agents have demonstrated impressive performance in static perception tasks, their ability to handle interactive, multi-step reasoning challenges like CAPTCHAs is largely untested. To address this gap, we introduce **Open CaptchaWorld**, the first web-based benchmark and platform specifically designed to evaluate the visual reasoning and interaction capabilities of MLLM-powered agents through diverse and dynamic CAPTCHA puzzles. Our benchmark spans 20 modern CAPTCHA types, totaling 225 CAPTCHAs, annotated with a new metric we propose: CAPTCHA Reasoning Depth, which quantifies the number of cognitive and motor steps required to solve each puzzle. Experimental results show that humans consistently achieve near-perfect scores, state-of-the-art MLLM agents struggle significantly, with success rates at most **40.0\%** by Browser-Use Openai-o3, far below human-level performance,**93.3\%**. This highlights Open CaptchaWorld as a vital benchmark for diagnosing the limits of current multimodal agents and guiding the development of more robust multimodal reasoning systems.

Croissant File: json

Dataset URL: https://huggingface.co/datasets/OpenCaptchaWorld/Open_CaptchaWorld

Code URL: https://github.com/MetaAgentX/OpenCaptchaWorld

Primary Area: Datasets & Benchmarks for applications in language modeling and vision language modeling

Submission Number: 81

Loading