# Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles

This folder contains supplementary materials for our NeurIPS 2025 submitted paper "Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles".

## Data Availability

Enigmata is a comprehensive benchmark that evaluates large language models' reasoning capabilities across seven distinct categories of puzzles. The complete Enigmata benchmark dataset will be publicly released upon paper publication. This folder currently contains a subset of sample tasks for demonstration purposes. The full dataset will include:
- Complete task sets for all puzzle categories
- Detailed task descriptions and rules
- Evaluation metrics and baselines
- Additional task variations and difficulty levels

## Task Categories

The benchmark consists of the following puzzle categories:

1. **Crypto Puzzles**
   - Evaluates cryptographic understanding and pattern recognition
   - Sample task: Crypto KPA

2. **Arithmetic Puzzles**
   - Focuses on numerical reasoning and arithmetic operations
   - Sample task: Game24

3. **Logic Puzzles**
   - Assesses deductive reasoning and inference capabilities
   - Sample task: Zebra Logic

4. **Grid Puzzles**
   - Challenges models with structured grid-based problems
   - Sample task: Sudoku

5. **Graph Puzzles**
   - Focuses on reasoning about nodes, edges, and paths
   - Sample task: Hamiltonian Path

6. **Search Puzzles**
   - Tests efficient exploration of state spaces
   - Sample task: Car Painting

7. **Sequential Puzzles**
   - Focuses on sequence understanding and prediction
   - Sample task: Sixteen Puzzle

## Data Format

The sample data is provided in JSONL format, with each line containing:
- Task description and rules
- Input data
- Expected output
- Metadata including difficulty level and task type

## Usage

1. Sample data can be found in `Enigmata_sample_data.jsonl`
2. Additional task examples are available in the `Enigmata_sample_tasks/` directory
3. The results of Qwen2.5-32B-Enigmata are provided in `Qwen2.5-32B-Enigmata_sample_result.jsonl`

