
#####################################################################################################
# This folder comprises the code and simulation files featured in the paper: 	    
#										          									  
#     On Learning Necessary and Sufficient Causal Graphs     
#####################################################################################################


The causal revolution has spurred interest in understanding complex relationships across various fields. Most existing methods aim to discover causal relationships among all variables within a large-scale complex graph. However, in practice, only a small subset of variables in the graph are relevant to the outcomes of interest. Consequently, causal estimation with the full causal graph---particularly given limited data---could lead to numerous falsely discovered, spurious variables that exhibit high correlation with, but exert no causal impact on, the target outcome. In this paper, we propose learning a class of necessary and sufficient causal graphs (NSCG) that exclusively comprises causally relevant variables for an outcome of interest, which we term causal features. The key idea is to employ probabilities of causation to systematically evaluate the importance of features in the causal graph, allowing us to identify a subgraph pertinent to the outcome of interest. To learn NSCG from data, we develop a necessary and sufficient causal structural learning (NSCSL) algorithm, by establishing theoretical properties and relationships between probabilities of causation and natural causal effects of features. Across empirical studies of simulated and real data, we demonstrate that NSCSL outperforms existing algorithms and can reveal crucial yeast genes for target heritable traits of interest.

 
#####################################################################################################

## Requirements

- Python >= 3.7
- `numpy`
- `pandas`
- `scipy`
- `networkx`
- `multiprocessing`
- `argparse`
- `pickle`
- `os`


## Contents

- Main and Bencnmark Functions:
  1. `notear_NS.py`: the implementation of necessary and sufficient causal structural learning based on NOTEARS.
  2. `utils.py` - graph simulation, data simulation, utility functions, analysis of causal effects, and accuracy evaluation.
  3. `notear.py`: the main function of NOTEARS (the benchmark method) by Zheng et al. (2018). 

- Experiments
  1. `Simulation and Comparison for Scenario 1.ipynb`: the design, experiment, comparison, and results for Scenario 1.
  2. `Simulation and Comparison for Scenario 2.ipynb`: the design, experiment, comparison, and results for Scenario 2.
  3. `Simulation and Comparison for Scenario 3.ipynb`: the design, experiment, comparison, and results for Scenario 3.
  4. `Simulation and Comparison for Scenario 4.ipynb`: the design, experiment, comparison, and results for Scenario 4. 


** Note that the real data of gene expression traits in yeast cannot be shared due to the privacy protocol. **


## Acknowledgments
Our work and code benefit from existing works, which we are very grateful.

* DAG NOTEAR https://github.com/xunzheng/notears
* Py-causal Package https://github.com/bd2kccd/py-causal
* LiNGAM https://pypi.org/project/lingam/
* DAG-GNN https://github.com/fishmoon1234/DAG-GNN
