# CausalSteward

<div align="center">

<img src="causalsteward.png" alt="CausalSteward Logo" width="900px"/>

<h3>AI-Powered Partner for Causal Discovery</h3>

![Python 3.12](https://img.shields.io/badge/python-3.12-blue.svg)
![License: Non-Commercial](https://img.shields.io/badge/License-Non_Commercial-yellow.svg)
![Status: Research](https://img.shields.io/badge/Status-Research-green)
</div>

> **Important**: This repository contains the implementation associated with the paper "CausalSteward: An Agentic Divide-Conquer-Combine Copilot for Causal Discovery" by anonymous authors, submitted to ICLR 2026 under double-blind review.

## Table of Contents

- [CausalSteward](#causalsteward)
  - [Table of Contents](#table-of-contents)
  - [Overview](#overview)
    - [🔍 Key Innovations](#-key-innovations)
    - [📊 Performance Highlights](#-performance-highlights)
  - [Quick Start](#quick-start)
  - [Project Goals](#project-goals)
  - [Technical Stack](#technical-stack)
  - [Project Structure](#project-structure)
  - [Installation](#installation)
    - [Prerequisites](#prerequisites)
    - [Environment Setup](#environment-setup)
    - [API Key Configuration](#api-key-configuration)
  - [Running the Application](#running-the-application)
    - [Via User Interface](#via-user-interface)
    - [Via Command Line](#via-command-line)
  - [Extending the Model](#extending-the-model)
  - [How to Cite](#how-to-cite)
    - [BibTeX](#bibtex)
  - [License](#license)

## Overview

CausalSteward is an innovative project aimed at developing an AI-powered partner for causal modeling. This project seeks to enhance the capabilities of existing systems by integrating advanced methodologies for data analysis, model building, and causal inference. 

Our goal is to create a robust framework that can handle real-world data complexities, including high dimensionality and unstructured data, while ensuring rigorous validation of inferences.

<div align="center">
<table>
<tr>
<td width="50%">

### 🔍 Key Innovations

- **Divide-Conquer-Combine** architecture for complex causal graphs
- **Statistical validation** of LLM-proposed causal structures through FCI.
- **Multi-agent system** with specialized roles for all phases.
- **Domain knowledge integration** from agentic RAG and Human-in-the-Loop

</td>
<td width="50%">

### 📊 Performance Highlights  

- Strong results on non-identifiable benchmarks for causal discovery 
- Limited computational requirements in comparison to traditional causal discovery methods
- Successfully handles graphs with **100+ variables**

</td>
</tr>
</table>
</div>


## Quick Start

```bash
# Set up environment
conda env create -f environment.yml
conda activate CausalSteward

# Set up API keys
cp example.env .env
# Edit .env with your Azure OpenAI API keys

# Run the UI
cd code
streamlit run src/ui.py
```

## Project Goals

The primary objectives of CausalSteward include:

- **Interactive Model Building**: Facilitate the interactive construction of causal models, allowing users to analyze causality in data effectively.
- **Leveraging External Knowledge**: Develop capabilities to extract causal knowledge from large datasets, enhancing the model's understanding and reasoning.
- **Statistical Validation**: Local causal graphs are derived by a combination of FCI and LLMs prior knowledge.

## Technical Stack

CausalSteward is built using the following technologies:

- **Programming Language**: Python 3.12
- **Multiagent Framework**: [Langgraph](https://github.com/langchain-ai/langgraph)
- **User Interface**: [Streamlit](https://streamlit.io/)
- **LLM Framework**: [Langchain](https://github.com/langchain-ai/langchain)
- **Language Models**: 
  - GPT-4o-mini
  - o3-mini
  - Qwen3-14B

## Project Structure

Here we describe the most important files within our repository.

```
CausalSteward/
├── src/
│   ├── main.py            # Main application entry point
│   ├── ui.py              # Streamlit UI implementation
│   ├── discovery_tools.py # Tools for causal discovery such as FCI
│   ├── state.py           # State management
│   └── utils/             # Utility functions
│       ├── tool_utils.py  # Data processing utilities
│       └── llm_utils.py   # LLM integration utilities
├── data/
│   ├── ASIA/              # ASIA Dataset
│   ├── neuropathic_pain/  # Neuropathic pain dataset
│   ├── causalman_medium/  # Causalman Medium dataset
│   └── causalman_small/   # Causalman Small dataset
├── environment.yml        # Conda environment spec
└── example.env            # Template for .env file
```

## Installation

### Prerequisites

- Python 3.12.7
- Conda package manager

### Environment Setup

We provide an environment.yml file for easy setup:

```bash
# Create environment
conda env create -f environment.yml

# Activate environment
conda activate CausalSteward
```

### API Key Configuration

1. Copy the example environment file and rename it:
   ```bash
   cp example.env .env
   ```

2. Edit the `.env` file with your Azure OpenAI API credentials:
   ```
   AZURE_OPENAI_API_KEY=your_api_key
   AZURE_OPENAI_ENDPOINT=your_endpoint
   AZURE_OPENAI_API_VERSION=your_api_version
   AZURE_OPENAI_DEPLOYMENT=your_deployment
   AZURE_OPENAI_MODEL_NAME=your_model_name
   ```

## Running the Application

### Via User Interface

For an interactive experience, use the Streamlit interface:

```bash
cd code
streamlit run src/ui.py
```

### Via Command Line

For command line execution:

```bash
cd code
python src/main.py
```

## Extending the Model

To use different language models, modify the configuration in `utils/llm_utils.py`. The system currently supports:

- Azure OpenAI models (GPT-4o-mini, o3-mini)
- Qwen3-14B (open source)
- Other models compatible with the Langchain framework

## How to Cite

In case you find this work useful, or it is related to your research, please consider citing it:

### BibTeX
```bibtex
@article{causalsteward2025,
  title     = {CausalSteward: An Agentic Divide-Conquer-Combine Copilot for Causal Discovery},
  author    = {Anonymous authors},
  year      = {2025},
  note      = {Under review}
}
```

## License

This project is licensed under the Non-Commercial License - see the LICENSE file for details.