Steering Back-Propagation with Prior Information in Natural Language

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Prior-Guided Tuning, Prior-based Gradient Editing, Large Language Models
Abstract: Large language models (LLMs) often struggle when task-relevant prior knowledge is missing or incorrect, leading to overfitting and hallucinations, especially on tasks with ambiguous or sparse data. While simple prompt concatenation provides priors, it fails to fundamentally reshape the model's internal representations and yields only marginal gains. We propose Prior-Guided Tuning (PGT), a paradigm that explicitly integrates natural language priors into the optimization landscape and steers the back-propagation training process. Under this paradigm, we introduce Prior-based Gradient Editing (PGE), which computes losses for positive (correct) and negative (misleading) prior prompts appended to original inputs and adds their difference as an extra term in the gradient update. The settings of auxiliary losses steer the model to internalize desired priors and improve task performance. Empirically, PGE-trained models outperform baselines on both a mathematical synthetic benchmark and real-world datasets (Jigsaw and BEAD), producing substantial gains in learning performance and efficiency. Ablations confirm that priors must be presented together with the original training data to be effective, and attention visualizations show that PGE-trained models tend to pay more attention to prior-relevant tokens. Our code and data will be made publicly available.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 15023
Loading