OpenDocAssistant: Language-Driven Document Automation and Evaluation

ACL ARR 2025 May Submission7296 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Modern document processing tools remain inaccessible to non-technical users due to steep learning curves. This paper introduces OpenDocAssistant, a natural language-driven document automation system that addresses three core challenges: multi-step instruction decomposition, semantic-to-API mapping, and efficient execution under resource constraints. Our three-stage architecture—planning, API selection, and execution—uses large language models (LLMs) to translate free-form instructions into document operations. The novel RaAPI mechanism combines dense embedding retrieval with LLM reasoning to bridge natural language instructions to appropriate API calls. Ablation studies show RaAPI's critical role (performance drops from 74.53\% to 12.40\% without it) and robust handling of vague instructions (>0.86 consistency, >0.95 API similarity). We evaluate 10 LLMs on OpenDocEval (110 annotated sessions) using Achievement Rate (AR) and Average Number of APIs (ANA). Large models achieve 74.53\% AR on complex tasks, while smaller models offer practical accuracy–efficiency trade-offs. This work demonstrates LLMs' potential to democratize document automation through natural language interfaces.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Document Automation, Large Language Models (LLMs), Retrieval-Augmented API Selection (RaAPI), Evaluation Benchmark, Natural Language Interface
Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: Chinese, English
Submission Number: 7296
Loading