Abstract: We propose Process-Aligned Learning (PAL), a general framework
that augments standard supervised training by incorporating intermediate behaviors (e.g. reasoning steps or action traces) into the
loss. Unlike traditional training that uses only final outputs, PAL
also aligns the model’s internal process with step-by-step demonstrations. This approach applies to diverse foundation models –
from language models to embodied agents and software-interaction
systems. We provide theoretical analysis showing that supervising
intermediate steps reduces spurious correlation and enforces structured representations, improving robustness and out-of-distribution
generalization. Empirically, we evaluate PAL on real-world tasks
(language reasoning with supervised chains-of-thought; web navigation and software workflows with human-like action traces) as
well as synthetic benchmarks (algorithmic arithmetic, logical puzzles, grid navigation). In all cases PAL-trained models generalize to
longer, noisier, or novel inputs significantly better than baselines.
Our method is distinct from prior chain-of-thought or Reinforcement Learning Human Feedback (RLHF) approaches, offering a
unified training paradigm with new auxiliary loss functions and
consistency objectives. We discuss practical data collection strategies and note challenges (e.g. noisy step annotations) for future
work.
Loading