Abstract: Large Language Models (LLMs) can reason about causality by leveraging vast pre-trained knowledge and text descriptions of datasets, demonstrating their effectiveness even when data is scarce. However, there are crucial limitations in current LLM-based causal reasoning methods: 1) LLM prompting is inherently inefficient for utilizing large tabular datasets when accounting for context length consumption and 2) the methods are not adept at comprehending the whole interconnected causal structures. On the other hand, data-driven causal discovery can discover the causal structure as a whole, although it works well only when the number of data observations is sufficiently large. To overcome the limitations of each approach, we propose a new framework that integrates LLM-based causal reasoning into data-driven causal discovery, resulting in improved and robust performance. Furthermore, our framework extends to time-series data and exhibits superior performance.
External IDs:doi:10.1109/access.2025.3626040
Loading