Abstract: Recent years have witnessed significant progress in large language models' (LLMs) reasoning, which is largely due to the chain-of-thought (CoT) approaches, allowing models to generate intermediate reasoning steps before reaching the final answer.
Building on these advances, state-of-the-art LLMs are instruction-tuned to provide long and detailed CoT pathways when responding to reasoning-related questions. 
However, human beings are naturally cognitive misers and will prompt language models to give rather short responses, thus raising a significant conflict with CoT reasoning.
In this paper, we delve into how LLMs' reasoning performance changes when users provide short-path prompts.
The results and analysis reveal that instruct models can reason effectively and robustly without explicit CoT prompts, while under short-path prompting, LLM tend to guess the final answer and the reasoning ability becomes unstable, even on grade-school problems.
Furthermore, we propose two approaches to explore whether the decision-making biases can be calibrated to prioritize reasoning accuracy, instead of overwhelming instruction following.
Experimental results show that both methods could achieve high accuracy, providing insights into the trade-off between instruction following and reasoning accuracy in current models.
Paper Type: Long
Research Area: Question Answering
Research Area Keywords: prompting, reasoning, large language model
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 5856
Loading