# Inverse Design for Text Generation with Accurate and Complex Causal Graph

This repository is the official implementation of Inverse Design for Text Generation with Accurate and Complex Causal Graph.

## Overview

This repository contains the implementation of iTAG (**i**nverse design for **T**ext gener**A**tion with causal **G**raph), a method for generating textual data with accurate and complex causal graph. The repository provides tools for both exploring the method's principles and reproducing the experimental results from our paper.

## Repository Structure

```
iTAG
│
├── 4.1_Evaluating_annotation_accuracy_across_complexities
│   ├── data_generation
│   └── data_process
│
├── 4.2_Investigating_the_substitutability_of_generated_data_for_real-world_data
│   ├── causal_discovery_methods
│   ├── datasets
│   └── results_evaluation
│
├── imgs
│
├── README.md
│
└── Textual_Causal_Data_Generation_with_iTAG.ipynb
```

## Quick Start

To quickly understand the principles of iTAG:
```
Explore the Textual_Causal_Data_Generation_with_iTAG.ipynb notebook, which contains code explanations and examples.
```

## Reproducing Experimental Results

To fully reproduce the experimental results from our paper, refer to the following directories:

### 4.1_Evaluating_annotation_accuracy_across_complexities
- `data_generation`: Contains implementations for data generation using both the SOTA baseline (Davinci) and our method (iTAG)
- `data_process`: Contains implementations for ground truth generation and metrics calculation for causal graph annotation accuracy

### 4.2_Investigating_the_substitutability_of_generated_data_for_real-world_data
- `causal_discovery_methods`: Contains implementations of SOTA causal discovery methods, including both LLMs and non-LLMs approaches
- `datasets`: Contains three real-world datasets used in our experiments
- `results_evaluation`: Contains implementations for calculating causal graph discovery accuracy on both generated and real-world data, as well as statistical correlation calculations between these results

## External References

This repository uses `.gitkeep` files to indicate code from other works that should be referenced:

- `./4.1_Evaluating_annotation_accuracy_across_complexities/data_generation/Davinci/Davinci.gitkeep`
  - Implementation should reference arXiv:2403.07118

- `./4.2_Investigating_the_substitutability_of_generated_data_for_real-world_data/causal_discovery_methods/non-LLMs/CGEN.gitkeep`
  - Implementation should reference arXiv:2102.05638

- `./4.2_Investigating_the_substitutability_of_generated_data_for_real-world_data/causal_discovery_methods/non-LLMs/CLEANN.gitkeep`
  - Implementation should reference arXiv:2310.20307

- `./4.2_Investigating_the_substitutability_of_generated_data_for_real-world_data/causal_discovery_methods/non-LLMs/PA.gitkeep, SA.gitkeep`
  - Implementation should reference https://doi.org/10.1145/3506804

- `./4.2_Investigating_the_substitutability_of_generated_data_for_real-world_data/datasets/MIMIC_IV_ver.2.2_NOTE, FinCausal_2025, JUSTICE`
  - Original datasets should respectively reference:
    - https://physionet.org/content/mimic-iv-note/2.2/
    - https://www.lllf.uam.es/wordpress/fincausal-25/
    - https://github.com/Sanavesa/JUSTICE-Judgment-Prediction

## Requirements

To install requirements:
```setup
pip install -r requirements.txt
```

## Contributing

This project is licensed under the MIT License. If you find any issues or have suggestions for improvements, please open an issue.