Abstract: Human-like planning aims to predict an action sequence given a task. The existing studies have demonstrated the potentials of Large Language Models (LLMs) upon human-like planning. However, it has not been verified whether LLMs are capable of overcoming an exceptional situation. Therefore, we carry out a preliminary study on Anti-Exception Planning (AEP) task. Specifically, we build AEP datasets using semi-artificial and automatic labeling approaches. On this basis, we evaluate AEP performance of different LLMs (Vicuna, Qwen, LLaMA, GPT-4o and DeepSeek-R1) within the Generation-Retrieval-Ranker (GRR) framework. In addition, we propose a reverse engineering approach to enhance GRR. Experiments show that LLMs tackle exceptions less effectively. The success rate of exception attack is up to 93.64\% at worst, although the reverse engineering-based GRR yields substantial improvements. We will make all datasets publicly available to support future studies.
Paper Type: Short
Research Area: Resources and Evaluation
Research Area Keywords: NLP datasets
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: English
Keywords: NLP datasets
Submission Number: 1639
Loading