# Compositional preference models for aligning LMs

We introduce Compositional Preference Models (CPMs), a novel framework for training robust and interpretable preference models.

The generic handling of language models and the generation depend on HuggingFace's Transformers library.

## Usage

- ```feature_extract/annotation.sh```: Extract feature values using LM
- ```mle-train/logistic_fits.sh```: Train logistic classifier that combines feature values into single model
- ```reward_model/pm_training.sh```: Train standard preference model
- ```mle-train/preference_evaluation.sh```: Evaluate preference alignment with LLM

