# Quantile-Guided Alignment (QA)

A framework for risk-calibrated language model alignment that allows fine-grained control over model outputs at specific quantiles.

## Overview

Quantile-Guided Alignment (QA) enables precise control over the distribution of language model outputs, particularly in high-risk tail events. Unlike standard RLHF approaches that optimize for average rewards, QA enforces constraints at specific quantiles, offering principled risk calibration.

## Key Features

- Specify alignment targets at any quantile in reward distributions
- Multi-dimensional reward optimization with quantile constraints
- Compatible with both conversation and code generation tasks
- Extensions to standard RLHF techniques via augmented reward formulations

## Repository Structure

- `core/`: Core alignment algorithms for quantile-guided optimization
- `models/`: Model implementations, including reward models
- `training/`: Training utilities for DPO and PPO with quantile constraints
- `data/`: Data handling utilities and datasets
- `visualization/`: Visualization tools for reward distributions
- `utils/`: General utility functions
- `cli/`: Command-line interface for running experiments
- `code-generation/`: Code generation task evaluation

## Abstract

Beyond Expectations: Quantile-Guided Alignment for Risk-Calibrated Language Models

Large language models can generate rare but catastrophic outputs, such as harmful conversations or insecure code. Existing Reinforcement Learning from Human Feedback (RLHF) typically maximizes average reward, leaving high-risk tail events insufficiently controlled. We introduce Quantile-Guided Alignment (QA), a framework that allows users to specify desired improvements at any quantile—individually or across multiple reward dimensions—thus shifting the distribution of outputs with finer control toward safer, more desirable outcomes. The method extends standard RLHF via an augmented reward formulation that enforces quantile constraints. Experiments on conversation and code-generation tasks show that quantile alignment significantly enhances quality at targeted tails while maintaining overall performance. The results position QA as a principled route to risk-calibrated language models with tail-focused alignment. 