# The-Geometry-of-Reasoning-Self-Evaluation-via-Layerwise-Trajectory-Evolution

Official implementation of the paper "The-Geometry-of-Reasoning-Self-Evaluation-via-Layerwise-Trajectory-Evolution"

## Overview

This repository contains the implementation of a novel approach for self-evaluation in reasoning tasks using The-Geometry-of-Reasoning. The project supports evaluation across multiple reasoning datasets including GSM8K, MMLU, CommonsenseQA, TheoremQA, and more.

## Repository Structure

```
.
├── Data/                   # Dataset files and data processing scripts
├── Evaluation/            # Evaluation utilities and metrics
├── Model/                # Model loading and inference
├── Scripts/             # Shell scripts for running experiments
├── config_pool.py       # Configuration settings
├── inference.py         # Inference pipeline
├── main.py             # Main entry point
├── prompt_pool.py      # Prompt templates and management
└── score.py            # Scoring and evaluation metrics
```

## Supported Datasets

- GSM8K (Grade School Math)
- MMLU (Massive Multitask Language Understanding)
- CommonsenseQA
- TheoremQA
- MGSM (Multilingual Grade School Math)
- Belebele

## Installation

```bash
pip install -r requirements.txt
```

## Usage

### Data Preprocessing
For GSM8K dataset preprocessing:
```bash
python Data/preprocess_gsm8k.py
```

### Running Inference
```bash
bash Scripts/llm_infer.sh
```

### Evaluation
```bash
bash Scripts/eval.sh
```

## Citation

If you find this code useful for your research, please consider citing:

```
[Citation will be added upon paper publication]
```

## License

[License information to be added]
