# NeurIPS 2025 Submission 21592 (Supplementary Material - Code & Experiment Data)

## Are Large Reasoning Models Good Translation Evaluators? Analysis and Performance Boost

This repository contains the code and experiment data for our NeurIPS 2025 submission.

---

## Overview

```
.
├── code
│   ├── README.md                
│   ├── esa_rescore.py           # ESA rescoring logic
│   ├── esa_utils.py             # Utility functions for ESA rescoring and data cleaning
│   ├── gemba_mqm_utils.py       # Utilities for GEMBA-MQM data processing
│   ├── mqm_ex_templates.py      # MQM evaluation prompt templates and ICL examples
│   ├── mqm_score.py             # Main script for MQM scoring and evaluation
│   ├── thinmqm_score.py         # ThinMQM scoring
│   ├── thinmqm_utils.py         # Helper functions for ThinMQM scoring
│   ├── train
│   │   ├── LLaMA_Factory        # (Submodule or directory) model training backbone
│   │   ├── ds_z3_config.json    # DeepSpeed training configuration
│   │   └── thin_mqm_32b.yaml    # Training hyperparameters for ThinMQM model
│   └── utils
│       ├── rule_postscore.py    # Post-processing rules for scoring outputs
│       └── thinmqm_extract.py   # Extraction utilities for ThinMQM results
```

Data is located in a separate folder.

---

## Dependencies
See `requirements.txt`.
