Inverse Design for Text Generation with Accurate and Complex Causal Graph

30 Apr 2025 (modified: 29 Oct 2025)Submitted to NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Causal Inference, Inverse Design, Data Generation
TL;DR: This paper generates data for causal inference in text where real-world data is lacking.
Abstract: The development and evaluation of causal discovery methods requires large quantities of data with causal structure annotations. However, such real-world data with annotations is insufficient. Therefore, text generation with causal structure annotations serves as a critical foundational task for advancing causal discovery research. Nevertheless, existing data generation methods cannot ensure both causal structure accuracy and complexity. To address this, we apply inverse design from scientific computing to Chain-of-Thought (CoT) and propose a method named iTAG. Our method is capable of generating large quantities of text with accurate and complex causal graphs. Empirical evaluation demonstrates the substitutability of iTAG-generated data for real-world data through two experiments. First, annotation accuracy evaluation shows remarkable causal graph annotation accuracy across complexities (F1>96\%, SHD<1, SID<0.5). Second, substitutability analysis reveals strong statistical correlation between generated and real-world text across various metrics computed on existing causal discovery algorithms (Pearson=0.96, Spearman=0.94, $R^2=0.93$).
Supplementary Material: zip
Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)
Submission Number: 5120
Loading