Process Aligned Learning: Supervise the Journey, Not Just the Destination

Published: 05 Nov 2025, Last Modified: 27 Jan 2026MLPREveryoneCC BY 4.0
Abstract: We propose Process-Aligned Learning (PAL), a general framework that augments standard supervised training by incorporating intermediate behaviors (e.g. reasoning steps or action traces) into the loss. Unlike traditional training that uses only final outputs, PAL also aligns the model's internal process with step-by-step demonstrations. This approach applies to diverse foundation models – from language models to embodied agents and software-interaction systems. We provide theoretical analysis showing that supervising intermediate steps reduces spurious correlation and enforces structured representations, improving robustness and out-of-distribution generalization. Empirically, we evaluate PAL on real-world tasks (language reasoning with supervised chains-of-thought; web navigation and software workflows with human-like action traces) as well as synthetic benchmarks (algorithmic arithmetic, logical puzzles, grid navigation). In all cases PAL-trained models generalize to longer, noisier, or novel inputs significantly better than baselines. Our method is distinct from prior chain-of-thought or Reinforcement Learning Human Feedback (RLHF) approaches, offering a unified training paradigm with new auxiliary loss functions and consistency objectives. We discuss practical data collection strategies and note challenges (e.g. noisy step annotations) for future work.
Loading