# Readme for Supplementary Material

This supplementary material contains resources and instructions for reproducing the core components of our paper, which consists of three main parts:

1. **Data Construction**
2. **Model Training**
3. **Model Evaluation**

## 1. Data Construction

The code for data construction will be made publicly available on GitHub after our paper is officially accepted for publication. Please stay tuned for updates.

## 2. Model Training

For model training, we utilize the [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory), a powerful and flexible framework for training LLaMA-based models. Please refer to their documentation for further details on usage and customization.

## 3. Model Evaluation

Model evaluation is divided into two parts:

- **In-Domain (ID) Evaluation:**  
  We use a custom-designed self-assessment dataset for in-domain evaluation. The evaluation scripts and some sample data are provided in the [`code`](./code) folder and [`test_set`](./test_set) folder.

- **Out-of-Domain (OOD) Evaluation:**  
  For OOD evaluation, we adopt the [FollowBench](https://github.com/YJiangcm/FollowBench) benchmark. Please refer to their repository for setup and evaluation instructions.

## Environment Setup

We provide the exact conda environment used for our experiments. Follow these steps to set up the environment:

1. Create a new conda environment:
   ```bash
   conda create --name recast python=3.10.16

2. Activate the environment
   ```bash
   conda activate recast
   
3. Install the required dependencies using our requirements.txt file:
   ```bash
   pip install -r requirements.txt
   
## Code Overview

The `code` folder contains several important components for constructing and evaluating rule-based constraints as well as performing in-domain evaluation:

- **`template.py`**  
  This file provides a template for defining **rule-based constraints**. It serves as a starting point for specifying custom constraint logic in a structured format.

- **`util.py`**  
  This file defines:
  - Utility functions required to **construct rule-based constraints**, and  
  - Functions to **validate whether the rule-based constraints are satisfied** during evaluation.

- **`evaluate.py`**  
  Implements the evaluation logic for our **in-domain (ID) test set**. You can run this script to assess model performance against our custom-designed test cases.

## Test Data and Evaluation Results

- The **`test_set`** folder contains test files used for in-domain evaluation.  
  We provide **four test sets**, each representing a different **level of complexity** in the evaluation criteria.

- Evaluation results for a 10-sample test case (`test_number_10.json`) are included in the **`evaluation`** folder.

---
## Running the Evaluation

To quickly run the in-domain evaluation, you need to provide two additional arguments to the `evaluate.py` script:

- `--api-key`: Your API key for accessing the model endpoint.
- `--api-url`: The base URL of the model API.

### Example

```bash

python evaluate.py --api-key YOUR_API_KEY --api-url YOUR_API_URL

