Swap or Skip? Challenging Step Type Identification in Instructional Manuals

Swap or Skip? Challenging Step Type Identification in Instructional Manuals

ACL ARR 2024 June Submission947 Authors

13 Jun 2024 (modified: 02 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large language models (LLMs) have been widely used as procedural planners, providing step-by-step guidance across applications. However, in a human-assistive scenario where the environment and users' knowledge constantly change, their ability to detect various step types for alternative plan generation remains under-explored. To fill this gap, we assess whether models can identify steps that are: (i) sequential, (ii) interchangeable, and (iii) optional in textual instructions. We compare LLMs to two vision-aware models relevant for procedural understanding: a large vision-language model and a heuristic approach that uses video-mined knowledge graphs. Our results indicate that LLMs struggle to capture the notion of mutual exclusivity between sequential and interchangeable steps. Furthermore, we report comprehensive analyses highlighting the advantages and limitations of using LLMs as procedural task guides. While the largest LLM shows expert-level task knowledge, our findings reveal its limitations in several key areas: broad task coverage, robustness towards diverse user phrasings, and physical reasoning.

Paper Type: Long

Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond

Research Area Keywords: multimodality, semantic relationships, knowledge tracing/discovering/inducing, robutsness

Contribution Types: Model analysis & interpretability, Data resources

Languages Studied: English

Submission Number: 947

Loading