# Dyck Language Reasoning Errors Identification

This project implements a Dyck language reasoning error identification game to test large language models' ability to recognize errors in bracket matching reasoning processes.

## Task Description

Dyck language is a formal language consisting of various bracket pairs that must be correctly closed. This task requires the model to identify errors in the reasoning process of bracket matching.

Specifically, given:
1. A valid Dyck sequence composed of multiple types of brackets (such as (), [], {}, <>)
2. A series of thinking steps analyzing the sequence, some of which may contain errors

The model needs to identify which thinking steps contain errors and output the numbers of the error steps in a comma-separated format. If all steps are correct, output an empty string "".

## Example

```
You are an expert in Dyck language, in which you must complete language sequences of all types of unclosed brackets (e.g., [], {}, <>). You need to analyze whether the steps for matching brackets are correct according to Dyck language rules.

Given an initial Dyck language sequence and steps used to derive the closed bracket sequence (presented as a thought process), your task is to determine the position of the first erroneous reasoning in the Dyck language, if any.

This could be forgetting to close a bracket, using the wrong closing bracket, or incorrectly copying a subsequence of closed brackets in the next step.

Task: Check the sequence to ensure brackets are correctly closed.
Input: ( < < > > ) [ { { } } ]
Thought 1: We should process the input one by one and track the stack configuration.
Thought 2: Stack: empty
Thought 3: ( ; Stack: (
Thought 4: < ; Stack: (<
Thought 5: < ; Stack: (< <
Thought 6: > ; Stack: (<
Thought 7: > ; Stack: empty
Thought 8: ) ; Stack: empty
Thought 9: [ ; Stack: [
Thought 10: { ; Stack: [ {
Thought 11: { ; Stack: [ { {
Thought 12: } ; Stack: [ { 
Thought 13: } ; Stack: [ 
Thought 14: ] ; Stack: empty
Thought 15: Now, we've reached the end. The final stack is empty.
Question: Are there any reasoning errors in this sequence? If there are no errors, output an empty string ""; if there are errors, output the number of the error step, which appears in "Thought process N".

Additional condition: If there are multiple errors, please output in this format: 7,9,12
```

In this example, the thought process is correct, so the answer should be an empty string "".

## Game Parameters

- `n_types`: Number of bracket types (1-4)
- `total_length`: Total length of the bracket sequence
- `num_of_questions`: Number of samples to generate
- `n_errors`: Number of erroneous thought processes, default is randomly generating 1-5 errors

## Quick Start

### Generate Multiple Parameter Test Data

```bash
# Basic usage, will generate 5 different parameter configurations
./run.sh
```

By default, the script will generate data for the following 5 parameter configurations, 1000 entries each:
- n_types=2, total_length=10, n_errors=1
- n_types=2, total_length=15, n_errors=1
- n_types=3, total_length=15, n_errors=2
- n_types=3, total_length=20, n_errors=2
- n_types=4, total_length=20, n_errors=2

### Generate Data Separately

You can also use the generation script separately to generate data with specific parameters:
```bash
python -m games.tasks.dyck_language_reasoning_errors.scripts.dyck_language_reasoning_errors --num_of_data 50 --n_types 4 --total_length 16 --n_errors 3
```