Agent Harness Engineering: A Survey

Published: 14 May 2026, Last Modified: 14 May 2026OpenReview Archive Direct UploadEveryonearXiv.org perpetual, non-exclusive license
Abstract: The rapid deployment of large language model (LLM) agents in production has revealed a recurring pattern: task execution reliability depends less on the underlying model than on the infrastructure layer that wraps it, the agent execution harness. This survey provides a practice-grounded, systematic treatment of agent harness engineering, organized around three claims. First, the agent harness is an independent system layer whose engineering quality drives a large share of real-world reliability, a position we develop through a three-phase engineering evolution from prompt to context to harness engineering, a cross-layer synthesis covering the cost–quality–speed trilemma, the capability-control tradeoff, and the harness coupling problem, and an open-problem agenda grounded in both research gaps and production pain points. Second, we propose ETCLOVG, a seven-layer taxonomy (Execution environment, Tool interface, Context management, Lifecycle/Orchestration, Observability, Verification, Governance) that extends prior six-component frameworks by treating observability and governance as independent architectural concerns. Third, we map 170+ open-source projects onto this taxonomy to expose ecosystem patterns, coverage gaps, and emerging design principles, alongside engineering principles distilled from production deployments at OpenAI, Anthropic, and LangChain that address the gap between practitioner knowledge and research vocabulary.
Loading