# Graph Evaluation Tool

Evaluates language models using graph-based coherence analysis. Computes Graph Coherence Score (GCS) and coherence variance metrics.

## Usage

```bash 
python graph_evaluation.py --graph graph.json --questions questions.json --results ans/ --output results.json
```

## Input Formats

### Model Results JSON
```json
{
  "model_name": "XX",
  "accuracy_percentage": 78.43,
  "results": [
    {
      "id": 1,
      "correct_answer": "C",
      "llm_answer": "C", 
      "is_correct": 1
    }
  ]
}
```

### Graph JSON
```json
{
  "edges": [
    {
      "source": 2643,
      "target": 5258,
      "weight": 0.85
    }
  ]
}
```


## Output

```json
{
  "models": {
    "model.json": {
      "model_name": "XX",
      "accuracy": 78.43,
      "full_graph": {
        "GCS": 61.99,
        "KBS": 15.19
      },
      "categories": {
        "biology": {
          "accuracy": 83.14049586776859,
          "GCS": 69.18631256792561,
          "KBS": 19.545975246236168
        }
      }
    }
  }
}
```

## Metrics

- **GCS**: Graph Coherence Score - measures performance alignment with graph structure
- **KBS**: Consistency of performance across graph regions
- **Alpha**: Weight parameter for variance calculation (default: 100.0) 

The complete dataset, including the graph structure, will be made publicly available upon acceptance.