# T-POP: Test-Time Personalization with Online Preference Feedback
## Setup

Install dependencies:
```bash
pip install -r requirements.txt
```

## Training and get score

```bash
python imple_armo.py \
  --llm_path "path/to/your/llm" \
  --openai_api_key "your-api-key" \
  --preference_model_name "your-judge-model" \
  --preference_api_base_url "your-api-url" \
  --output_dir "results/" \
  --data_file "data/personal_preference_eval_preference_data.json" \
  --attribute "creative" \
  --train_samples 100 \
  --reward_weight 1.0
```

## Gpt4o evaluation

```bash
python winrate_evaluator.py \
  --ours_file "results/responses/responses_armo_rw_1.00.jsonl" \
  --baseline_file "baseline_responses.jsonl" \
  --attribute "creative" \
  --openai_api_key "your-api-key" \
  --judge_model "your-judge-model" \
  --api_base_url "your-api-url" \
  --output_file "evaluation_results.txt"
```

## Results

Training results are saved in the specified `--output_dir`:
- `responses/responses_armo_rw_X.XX.jsonl`: Generated responses
- `result.txt`: Experiment summary and scores
- `checkpoints/final_model.pt`: Trained reward model

Gpt4o Evaluation results are saved to the specified `--output_file` with winrate statistics.