Beyond Instruction Following: Evaluating Inferential Rule Following of Large Language Models

ACL ARR 2025 February Submission1209 Authors

13 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Although Large Language Models (LLMs) have demonstrated strong $\textbf{instruction-following}$ ability, they are further supposed to be controlled and guided by $\textbf{inferential rules}$ in real-world scenarios to be safe, accurate, and intelligent. This demands the possession of $\textbf{inferential rule-following}$ capability of LLMs. However, no prior work has made a clear evaluation of the inferential rule-following capability of LLMs. Previous studies that try to evaluate the inferential rule-following capability of LLMs fail to distinguish the inferential rule-following scenarios from the instruction-following scenarios. Therefore, this paper first clarifies the concept of inferential rule-following and proposes a comprehensive benchmark, $\textbf{RuleBench}$, to evaluate a diversified range of inferential rule-following abilities. Our experimental results on a variety of LLMs show that they are still limited in following rules. Our analysis based on the evaluation results provides insights into the improvements for LLMs toward a better inferential rule-following intelligent agent. We further propose Inferential Rule-Following Tuning (IRFT). %which outperforms IFT in helping LLMs solve RuleBench. The experimental results show that through IRFT, LLMs can learn abstract inferential rule-following abilities from purely synthetic data and then generalize to RuleBench. The data and code can be found at: https://anonymous.4open.science/r/llm-rule-following-B3E3/
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: applications
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Reproduction study, Data resources
Languages Studied: English, Chinese
Submission Number: 1209
Loading