# TMPC

This is the implementation code for the paper: Test-Time Alignment for Large Language Models via Textual Model Predictive Control.

## 🛠 Environment Setup

This document provides a step-by-step guide for setting up the environment required to run TMPC efficiently. Please follow the instructions below to ensure a smooth installation process.

### 1. Create a Conda Virtual Environment

It is highly recommended to use a Conda virtual environment to manage dependencies and avoid conflicts. Execute the following commands:

```bash
conda create --name TMPC python=3.9
conda activate TMPC
```

### 2. Install Dependencies: SpaCy, VecAlign and LASER

Text segmentation is handled by **SpaCy**. Please refer to the [spaCy Installation Guide](https://spacy.io/usage) for installing the relevant language models. For translation task, you can update the supported translation languages via the `lang_map` dictionary in `TMPC_wmt.py` (Line 81).

In tranlsation task, TMPC relies on **VecAlign** and **LASER** for segment alignment. Follow the installation instructions in the [VecAlign GitHub Repository](https://github.com/thompsonb/vecalign).

LASER must be properly configured by setting up the required environment variables. Use the following steps:

```bash
nano ~/.bashrc
export LASER="{PATH_TO_LASER}"
source ~/.bashrc
```

Make sure to replace `{PATH_TO_LASER}` with the actual path where LASER is installed.

### 3. 🧠 Configure Reward Model

TMPC utilizes a reward model for test-time alignment. Ensure that you modify the following paths in your reward model setup before use:

For HH_RLHF task: Set args.rm_path to our huggingface repository.

For Translation task: Set args.rm to our huggingface repository.
```python
# for HH_RLHF
parser.add_argument("--rm_path", type=str, default='', help="reward model path")

# for Translation
parser.add_argument("--rm", type=str, default='metricx', help="Set the rm.")
```
For translation task, an alternative metric—**MetricX-QE**—is available to replace the reward model.

### 4. 🚀 Running TMPC
To generate and save responses for each prompt across different iterations in HH_RLHF task, run the following command:

```bash
python TMPC_hh.py \
    --input_file "hhrlhf.csv" \
    --output_folder "p2a" \
    --cuda_num 0
```

This will process the prompts from `hhrlhf.csv`, and store the responses in the specified output_folder. Each prompt's responses will be saved in separate files such as `prompt_0.json`, `prompt_1.json`, etc.


For Translation task, TMPC accepts a CSV file as input with each column designated by a language code (e.g., `zh`, `en`). Specify source and target languages via command-line arguments. For example, to perform a Chinese-to-English translation task using `valid_en_ja.csv`, execute:

```bash
python TMPC_wmt.py \
    --input_file "valid_en_ja.csv" \
    --rm "metricx" \
    --src_language English \
    --task_language Japanese \
    --threshold 0.7 \
    --max_iterations 6 \
    --good_ref_contexts_num 5 \
    --cuda_num 0
```
Results from each iteration and final outputs will be saved in a folder named after the input file (e.g., `valid_en_ja`).

### 5. 📊 Evaluation Process

For HH_RLHF task, use the same script with the evaluation flag. Set `--eval_input_folder` to the folder containing the generated prompt files (`prompt_0.json`, `prompt_1.json`, etc.).

Specify `--eval_it` to indicate which iteration to evaluate. The script will automatically select the best-performing response from all previous iterations up to the one specified.

```bash
python TMPC_hh.py \
    --evaluate \
    --eval_input_folder p2a \
    --eval_it 3
    --eval_range 1024 \
    --eval_cuda_num 0
```

For Translation task, the results from each iteration are stored in separate folders. To merge results from a specific iteration into a single CSV file, use the following command. For example, to merge iteration 5 results into `valid_en_ja.csv` with the output column named `TMPC`:

```bash
python memory2csv.py \
    --num 5 \
    --input_csv valid_en_ja.csv  \
    --output_csv eval_en_ja.csv \
    --column_name TMPC
```

Then, evaluate the `TMPC` column with:

```bash
python long_context_eval.py \
    --file valid_en_ja.csv \
    --target_column TMPC \
    --save eval_en_ja \
    --src_language English \
    --task_language Japanese
```

The evaluation scores will be saved in the `eval_en_ja` folder as `evaluated_results_TMPC.csv`.


## 🔍 A Closer Look at Test-Time Alignment Methods
Test-time alignment is a promising direction for adapting language models without fine-tuning. In our work, we provide a comparative analysis of recent methods and observe that some approaches struggle to consistently improve base model performance. We suggest that this may be due to the limitations of guided decoding, which typically lacks mechanisms to revise earlier outputs.

Our method, TMPC, addresses these challenges through a lightweight planning-based approach that offers stronger performance under similar computational budgets.