Automatic agent chaining for multimodal task support

Ramesh Manuvinakurike; Celal Savur; Emanuel Moss; Elizabeth Anne Watkins; Saurav Sahay; Giuseppe Raffa

Automatic agent chaining for multimodal task support

Ramesh Manuvinakurike, Celal Savur, Emanuel Moss, Elizabeth Anne Watkins, Saurav Sahay, Giuseppe Raffa

Published: 24 Sept 2025, Last Modified: 24 Sept 2025NeurIPS 2025 LLM Evaluation Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Agentic AI, Planning, Task Guidance System

TL;DR: Agentic system for multimodal task assistance dataset and approach

Abstract: The future of human-computer interaction is moving toward systems where Large Language Models (LLMs) act as autonomous agents, capable of self-planning and adapting to complex, domain-specific tasks. However, a significant gap remains in developing agentic architectures that can seamlessly integrate into real-world, multimodal task support systems. We present our initial work on a novel agentic architecture for process task guidance, designed to assist human technicians in complex physical tasks. Our system develops automatic agent chaining features via dynamic planner that recruits specialized agents for task solving. To evaluate this approach, we collected a novel multimodal dataset of human-agent interactions during a toy assembly task and benchmarked our agentic system against a non-agentic baseline. Our findings show that the agentic solution significantly improves response quality and reduces incorrect outputs.

Submission Number: 75

Loading