# Graph-Based ARC Challenge

## Overview

This repository contains the code and resources for **GraphARC**. The project extends the **Abstraction & Reasoning Corpus (ARC)** into a **graph-based domain**, introducing new puzzle-like tasks that test AI reasoning beyond grid-based adjacency structures.


## Repository Structure

```
├── datasets/            # Stores generated graph-based ARC tasks and visualizations
│   ├── star_colorLeaves/            # Example: Star-shaped graphs with colored leaves
│   │   ├── textual/     # Encoded graph structures (adjacency, incident)
│   │   │   ├── input/  # Input graph encodings, organized by node count
│   │   │   │   ├── 5/  # Input graphs with 5 nodes
│   │   │   │   ├── 10/ # Input graphs with 10 nodes
│   │   │   │   ├── 15/ # Input graphs with 15 nodes
│   │   │   ├── output/ # Output graph encodings, organized by node count
│   │   ├── visual/      # Graph visualizations (saved as PNG)
│   │   │   ├── input/  # Input graph visualizations
│   │   │   ├── output/ # Output graph visualizations
│   │   ├── prompts/     # Stored prompts used for model queries
│   │   ├── responses/   # Stored responses from AI models
│   │   ├── specs.json   # Comprehensive metadata about this benchmark
│   ├── ...              # Other datasets
├── evaluation_data/     # Evaluation results and visualizations
│   ├── evaluation_summary.csv  # Compiled evaluation metrics
│   ├── evaluation_summary_<model>.png  # Performance charts per model
│   ├── visualizations/  # Generated visualizations and analysis charts
├── scripts/             # Python scripts for dataset generation and visualization
│   ├── benchmarks/      # Legacy graph-based ARC benchmark scripts
│   ├── tasks/           # Task implementations using the new framework
│   ├── utils/           # Core framework utilities
│   │   ├── task_base.py         # Base task class definition
│   │   ├── task_definition.py   # Task registration system
│   │   ├── properties.py        # Graph property verification with PropertyStatus enum
│   │   ├── pretransformations.py# Pre-transformation system
│   │   ├── create_graph.py      # Graph generation functions
│   │   ├── generate_data.py     # Generates graph data with transformations
│   │   ├── metadata.py          # Handles specs.json generation with thread-safe updates
│   │   ├── visualize_graph.py   # Generates graph visualizations
│   ├── README.md        # Detailed documentation on the task system
│   ├── evaluate_responses.py    # Evaluates responses from AI models
│   ├── generate_graphs.py       # Generates all graph data with comprehensive error reporting
│   ├── run_benchmarks.py        # Runs models on generated graphs
│   ├── batch_run_benchmarks.py  # Runs batch API jobs for cost-efficient processing
├── models/              # Model interface implementations
│   ├── openai/          # OpenAI API integration
│   ├── google_gemini/   # Google Gemini API integration
│   ├── qwen_local/      # Qwen model integration
├── batch_jobs/          # Job files and outputs from batch processing
├── .env                 # Contains environment variables (API keys)
├── .gitignore           # Git ignore configuration
├── .pylintrc            # Code style rules
├── load_env.sh          # Script to load environment variables from .env file
├── environment.yml      # Conda environment configuration
└── README.md            # Project documentation (this file)
```

## Benchmark Types

The repository implements various graph transformation benchmark tasks:

| Benchmark Type | Description |
|----------------|-------------|
| **addHub** | Adds a new blue-colored hub node connected to all existing nodes |
| **bipartitionCompletion** | Colors nodes based on bipartite partitioning given seed nodes |
| **colorComponents** | Colors nodes based on connected component membership |
| **colorDegreeX** | Colors all nodes with degree X blue (where X = 1, 2, 3, etc.) |
| **colorInternal** | Colors all non-leaf (internal) nodes in a tree blue |
| **colorLeaves** | Colors all leaf nodes (degree 1) blue |
| **colorMaxDegree** | Colors all nodes with maximum degree blue |
| **colorMinDegree** | Colors all nodes with minimum degree blue |
| **colorNeighbors** | Colors all neighbors of a specified orange node blue |
| **colorPath** | Colors all nodes on the path between two specified nodes blue |
| **edgeToNode** | Replaces each edge with a new node connected to both endpoints |
| **removeDegreeX** | Removes all nodes with degree X (where X = 1, 2, etc.) |

For detailed documentation on the task system, graph generators, properties, and pre-transformations, see the [Task System Documentation](scripts/README.md).

## Getting Started

### Prerequisites

#### Setting Up the Conda Environment

To ensure you have the correct dependencies, follow these steps:

1. **Install Conda** (if you haven't already):  
   - [Miniconda](https://docs.conda.io/en/latest/miniconda.html) (lightweight)  
   - [Anaconda](https://www.anaconda.com/) (full distribution)  

2. **Create a Conda environment from the environment file:**  
   ```sh
   conda env create -f environment.yml
   conda activate graph_arc
   ```

### API Credentials Setup

1. **Create a `.env` file** in the root directory with your API keys:
   ```
   OPENAI_API_KEY=your_openai_key_here
   GEMINI_API_KEY=your_gemini_key_here
   HUGGINGFACE_TOKEN=your_huggingface_token_here
   ```

2. **Load environment variables**:
   ```sh
   source load_env.sh
   ```

## Workflow

The project uses a modular workflow:

1. **Generate graph datasets** with different structures and sizes:
   ```bash
   python -m scripts.generate_graphs --benchmarks colorLeaves colorDegree2
   ```

2. **Generate prompts** with custom configurations:
   ```bash
   python -m scripts.generate_prompts --benchmarks colorLeaves --pattern scale_up_3 --system_prompt analyst
   ```

3. **Run models** on the generated prompts:
   ```bash
   # Run all existing prompts with a specific model
   python -m scripts.run_tasks --run_all_prompts --model_backend gpt-4.1-nano
   
   # Or use batch processing for OpenAI models
   python -m scripts.batch_run_tasks --run_all_prompts --model gpt-4o-mini-batch-api
   ```

4. **Evaluate model performance**:
   ```bash
   python -m scripts.evaluate_responses
   ```

5. **Visualize results**:
   ```bash
   python -m scripts.visualise_results evaluation_data/evaluation_results.json
   ```

For more detailed explanations of the task system, graph properties, and implementation details, see the [scripts/README.md](scripts/README.md).

## References

- F. Chollet. *On the Measure of Intelligence*. arXiv:1911.01547, 2019.
- ARC Prize Foundation. *OpenAI O3 Breakthrough on ARC-AGI-PUB*. [Blog](https://arcprize.org/blog/oai-o3-pub-breakthrough)