On Incorporating Prior Knowledge Extracted From Large Language Models Into Causal Discovery

Chanhui Lee, Juhyeon Kim, Yongjun Jeong, Yoonseok Yeom, Juhyun Lyu, Junghee Kim, Sangmin Lee, Sangjun Han, Hyeokjun Choe, Soyeon Park, Woohyung Lim, Sungbin Lim, Sanghack Lee

Published: 01 Jan 2025, Last Modified: 08 Jan 2026IEEE AccessEveryoneRevisionsCC BY-SA 4.0
Abstract: Large Language Models (LLMs) can reason about causality by leveraging vast pre-trained knowledge and text descriptions of datasets, demonstrating their effectiveness even when data is scarce. However, there are crucial limitations in current LLM-based causal reasoning methods: 1) LLM prompting is inherently inefficient for utilizing large tabular datasets when accounting for context length consumption and 2) the methods are not adept at comprehending the whole interconnected causal structures. On the other hand, data-driven causal discovery can discover the causal structure as a whole, although it works well only when the number of data observations is sufficiently large. To overcome the limitations of each approach, we propose a new framework that integrates LLM-based causal reasoning into data-driven causal discovery, resulting in improved and robust performance. Furthermore, our framework extends to time-series data and exhibits superior performance.
Loading