Do LLMs have an Anti-exception Reasoning Ability for Planning

Do LLMs have an Anti-exception Reasoning Ability for Planning

ACL ARR 2025 May Submission1639 Authors

18 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Human-like planning aims to predict an action sequence given a task. The existing studies have demonstrated the potentials of Large Language Models (LLMs) upon human-like planning. However, it has not been verified whether LLMs are capable of overcoming an exceptional situation. Therefore, we carry out a preliminary study on Anti-Exception Planning (AEP) task. Specifically, we build AEP datasets using semi-artificial and automatic labeling approaches. On this basis, we evaluate AEP performance of different LLMs (Vicuna, Qwen, LLaMA, GPT-4o and DeepSeek-R1) within the Generation-Retrieval-Ranker (GRR) framework. In addition, we propose a reverse engineering approach to enhance GRR. Experiments show that LLMs tackle exceptions less effectively. The success rate of exception attack is up to 93.64\% at worst, although the reverse engineering-based GRR yields substantial improvements. We will make all datasets publicly available to support future studies.

Paper Type: Short

Research Area: Resources and Evaluation

Research Area Keywords: NLP datasets

Contribution Types: NLP engineering experiment, Data resources

Languages Studied: English

Keywords: NLP datasets

Submission Number: 1639

Loading