Abstract: Causal reasoning is one of the primary bottlenecks that Large Language Models (LLMs) must overcome to attain human-level intelligence. Recent studies indicate that LLMs display near-random performance on reasoning tasks. To address this, we introduce the Causal Chain of Prompting ($\text{C}^2\text{P}$), the first reasoning framework that equips current LLMs with causal reasoning capabilities. $\text{C}^2\text{P}$ operates autonomously, without relying on external tools or modules during both the causal learning and reasoning phases, and can be seamlessly integrated into the training or fine-tuning of LLMs. To evaluate the performance of $\text{C}^2\text{P}$, we first demonstrate that reasoning accuracy improved by over $30.7\%$ and $25.9\%$ for GPT-4 Turbo and LLaMA 3.1, respectively, when using our framework, compared to the same models without $\text{C}^2\text{P}$ on a synthetic benchmark dataset. Then, using few-shot learning of the same LLMs with $\text{C}^2\text{P}$, reasoning accuracy increased by over $20.05\%$ and $20.89\%$, respectively, with as few as ten examples, compared to the corresponding LLMs without $\text{C}^2\text{P}$ on the same dataset. To better evaluate $\text{C}^2\text{P}$ in realistic scenarios, we utilized another benchmark dataset containing natural stories across various fields, including healthcare, medicine, economics, education, social sciences, environmental science, and marketing. The results show improved reasoning when $\text{C}^2\text{P}$ is applied, compared to cases where our framework is not used, which often leads to random or hallucinated responses. The improvement observed in both few-shot learned GPT-4 Turbo and LLaMA 3.1 provides evidence of the generalizability of $\text{C}^2\text{P}$, highlighting its potential to be incorporated into the training or fine-tuning of new LLMs to enhance their reasoning capabilities.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Fredrik_Daniel_Johansson1
Submission Number: 3091
Loading