Abstract: In practical statistical causal discovery (SCD), embedding domain expert knowledge as constraints into the algorithm is important for reasonable causal models reflecting the broad knowledge of domain experts, despite the challenges in the systematic acquisition of background knowledge.
To overcome these challenges, this paper proposes a novel method for causal inference, in which SCD and knowledge based causal inference (KBCI) with a large language model (LLM) are synthesized through “statistical causal prompting (SCP)” for LLMs and prior knowledge augmentation for SCD.
The experiments in this work have revealed that the results of LLM-KBCI and SCD augmented with LLM-KBCI approach the ground truths, more than the SCD result without prior knowledge.
These experiments have also revealed that the SCD result can be further improved if the LLM undergoes SCP.
Furthermore, with an unpublished real-world dataset, we have demonstrated that the background knowledge provided by the LLM can improve the SCD on this dataset, even if this dataset has never been included in the training data of the LLM.
For future practical application of this proposed method across important domains such as healthcare, we also thoroughly discuss the limitations, risks of critical errors, expected improvement of techniques around LLMs, and realistic integration of expert checks of the results into this automatic process, with SCP simulations under various conditions both in successful and failure scenarios.
The careful and appropriate application of the proposed approach in this work, with improvement and customization for each domain, can thus address challenges such as dataset biases and limitations, illustrating the potential of LLMs to improve data-driven causal inference across diverse scientific domains.
The code used in this work will be made publicly available upon the acceptance of this paper.
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=rJRmspwYuW&nesting=2&sort=date-desc
Changes Since Last Submission: Dear Action Editor and Reviewers,
We sincerely thank you for your constructive feedback on our previous submission of paper 2730.
Your insightful comments have guided us to enhance the manuscript.
Below, we outline the major changes, highlighting how your suggestions in our previous submission have been addressed.
All the revisions after the last version of paper 2730 are highlighted in magenta in the manuscript for your convenience.
1. In-depth Discussion on Limitations, Risks, and Failure Cases
Related to: Action Editor (8bgj), Reviewer SVi9, Reviewer 1vYs, and Reviewer 3L5q
We recognize in particular that Reviewer SVi9 and Action Editor 8bgj emphasized the need for a comprehensive discussion on the limitations and risks of the proposed SCP method, including failure cases.
Therefore, we have significantly expanded Section 4.3, adding a thorough analysis of the limitations and risks associated with SCP.
The new content includes:
・Detailed examination of failure cases, analyzing the conditions under which SCP may lead to misinterpretations or biased causal discovery.
・Practical strategies for risk mitigation, such as expert verification, simulation-based robustness assessments, and careful deployment in critical applications like healthcare.
・A comparative analysis of scenarios where SCP performs well and those where it may face challenges, offering a balanced perspective on its applicability.
These additions aim to provide a clearer understanding of the robustness and constraints of SCP, addressing the concerns raised by the Action Editor and Reviewers.
2. Addressing Reproducibility and Verification in other LLMs
Related to: Reviewer 3L5q
Reviewer 3L5q emphasized the importance of improving reproducibility through public code release and evaluation with open-source models.
Reconsidering this comment, in the Abstract, we have explicitly stated our intention to release the code upon acceptance to ensure transparency and reproducibility.
Additionally, we have included Appendix H, which presents experimental results using alternative large language models (e.g., GPT-3.5 variants).
These experiments validate the generalizability of SCP across different LLMs architectures, addressing concerns about over-reliance on a specific LLM.
We recognize that these revisions collectively address the key concerns raised by the Action Editor and all Reviewers in the previous submision.
We believe the revised manuscript is now significantly stronger, more comprehensive, and ready for re-evaluation than the last manuscript of submission 2730.
We welcome further feedback and hope this submission meets your expectations.
Sincerely,
Authors of Paper 2730
Assigned Action Editor: ~Fabio_Stella1
Submission Number: 4060
Loading